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ABSTRACT 

71'  5  cloC/r-f  ”  T  “J 

— -We  study^the  convergence^ properties  of  reduced  ^lessian  successive  quadratic  pro¬ 
gramming  for  equality  constrained  optimization.  The  method  uses  a  backtracking 
line  search,  and  updates  an  approximation  to  the  reduced  Hessian  of  the  Lagrangian 
by  means  of  the  BFGS  formula.  Two  merit  functionsjare  considered  for  the  line 
seach:  the  function  and  the  Fletcher  exact  penalty  function.  We  give  conditions 
under  which  local  and  superlinear  convergence  is  obtained,  and  also  prove  a  global 
convergence  result.  The  analysis  allows  the  initial  reduced  Hessian  approximation 
to  be  any  positive  definite  matrix,  and  does  not  assume  that  the  iterates  converge, 
or  that  the  matrices  are  bounded.  The  effects  of  (a  second  order  correction  step, 
a  watchdog  procedure  and  of  Jfrhe  choice  of  null  space  basis  are  considered.  This 
work  can  been  seen  as  an  extension  of  the  well  known  results  of  Powell  (1976)  for 
unconstrained  optimization  to  reduced  Hessian  methods.  ^  p 

Key  words,  constrained  optimization,  reduced  Hessian  methods,  quasi-Newton  meth¬ 
ods,  successive  quadratic  programming,  nonlinear  programming 

AMS(MOS)  subject  classification.  65,  49 


1.  Introduction. 

In  this  paper  we  analyze  reduced  Hessian  successive  quadratic  programming  methods  for 
solving  the  equality  constrained  optimization  problem 


min  fix) 
xeRn 


subject  to  c(x)  =  0, 


(1-1) 


where  /  :  Rn  — ►  R,  and  c  :  Rn  — ►  R1  are  smooth  nonlinear  functions.  These  methods, 
which  we  also  refer  to  as  reduced  Hessian  methods,  generate  at  x*  a  search  direction  by 
solving  the  quadratic  program 


min  g(xk)Td +\dT ZkBkZjd 
deR.n  2 

(1.2) 

subject  to  c(xk)  +  A(xk)Td  =  0, 

where  g  is  the  gradient  of  /,  A(x)  =  [Vc^x), Vc*(x)]  is  the  n  x  t  matrix  of  constraint 
gradients,  Zk  is  a  matrix  whose  columns  form  an  orthonormal  basis  for  the  null  space  of 
A(xk)T,  and  Bk  is  a  matrix  that  approximates  the  reduced  Hessian  of  the  Lagrangian 
function.  The  new  iterate  is  given  by 

Xk+ i  =  Xk  +  aiedh, 


where  the  steplength  a/t  is  chosen  to  force  progress  towards  the  solution  of  (1.1).  Our 
goal  in  this  paper  is  to  develop  some  practical  convergence  results  for  reduced  Hessian 
methods  in  which  Bk  is  updated  by  the  BFGS  formula  and  the  initial  matrix  B0  is  an 
arbitrary  positive  definite  matrix. 

Reduced  Hessian  methods  are  a  special  case  of  successive  quadratic  programming 
(SQP)  methods,  which  are  based  on  the  subproblem 


mma  g{xk)Td  +  i dTMkd 
subject  to  c(xfc)  +  A(ik)Td  =  0. 


(1.3) 


Specifically,  problem  (1.2)  is  equivalent  to  a  problem  of  the  form  (1.3)  with  Af*  = 
ZkBkZj .  The  general  equality  constrained  quadratic  program  (1.3)  is  equivalent  to 
a  problem  of  the  form  (1.2)  if  and  only  if  Zj MkA(xk)  =  0. 

Solving  problem  (1.1)  by  iterative  solution  of  (1.3)  is  an  old  idea  since,  if  Mk  = 
V2xxL{xk ,  A*)  and  A*  is  the  multiplier  vector  of  the  quadratic  program  at  iteration  k-1. 
this  is  equivalent  to  Newton’s  method  on  the  Kuhn-Tucker  conditions  for  (1.1).  An 
alternative  is  to  try  to  make  Mjt  a  secant  approximation  to  the  Hessian  of  the  Lagrangian, 
using  a  positive  definite  secant  update  such  as  BFGS  or  DFP.  That  is,  A/*  would  be 
updated  so  that  Af*+i5fc  =  yk-,  where  3k  =  Xfc+i  -  Xk,  and  yk  is  some  vector  approximately 
equal  to  V2xL(xk,  Afc)5fc,  such  as  VrT(zjfc+i,  A*)  -  Vz£,(xfc,  A*).  This  idea  cannot  be 
carried  out  in  a  straightforward  fashion  since  the  Hessian  of  the  Lagrangian  at  a  solution 
of  (1.1)  is  not  necessarily  positive  definite.  Several  approaches  have  been  proposed  for 
coping  with  this  difficulty,  and  reduced  Hessian  SQP  is  one  of  these.  Before  discussing 
reduced  Hessian  methods,  we  briefly  mention  some  other  approaches  which  instead  solve 
a  problem  of  the  form  (1.3)  with  Mk  an  n  x  n  positive  definite  matrix. 
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An  early  proposal  is  to  update  Mk  so  as  to  approximate  the  Hessian  of  the  augmented 
Lagrangian,  V*xL(xk,  \k)  +  p.4fc.4j,  which  is  positive  definite  near  the  solution  if  the 
scalar  p  is  chosen  sufficiently  large.  This  was  analyzed  by  Han  (1976),  Tapia  (1977),  and 
Glad  ( 1979),  who  showed  that  if  a  sufficiently  large  value  of  the  augmentation  parameter  is 
used,  and  if  xo  and  Mq  are  good  enough  approximations  to  the  solution  and  to  the  Hessian 
of  the  augmented  Lagrangian,  respectively,  then  the  iterates  converge  Q-superlinearly  to 
the  solution.  A  different  approach,  due  to  Powell,  is  to  update  the  matrix  only  part  way 
so  that  A/it+iSjt  =  6yk  +  (1  -  d)Mksk,  where  0  £  [0, 1]  is  chosen  to  preserve  a  degree  of 
positive  definiteness.  Powell  (1978)  proves  that  if  {x*}  converges  to  the  solution,  and 
if  the  sequences  {||Aftj|}  and  {\\(Zk  MkZk)~x\\)  are  bounded,  then  the  convergence  rate 
is  R-super).inear.  The  same  result  is  proved  by  Fenyes  (1987)  for  his  updating  scheme, 
which  preserves  positive  definiteness  only  of  ZkMkZk.  Boggs  and  Tolle  (1985)  suggest 
that  Mk  simply  be  left  unchanged  in  cases  when  updating  would  cause  a  loss  of  positive 
definiteness.  They  prove  that  if  {x*}  converges  to  the  solution  Q-linearly,  and  if  the 
directions  produced  by  the  algorithm  converge  sufficiently  fast  to  the  null  space  of  the 
constraint  derivatives,  then  {xjt}  converges  Q-superlinearly. 

The  reduced  Hessian  approach  is  motivated  by  the  fact  that  near  the  solution 
Zj[V2rxL(xk,  Afc)Zfc  is  usually  positive  definite,  and  thus  it  is  reasonable  to  approximate 
this  matrix  using  a  positive  definite  update  formula.  In  this  case  the  matrix  Bk  of  (1.2) 
would  be  updated  so  that  Bk+\sk  =  J/fc,  where  sk  =  Z£(xk+l  -  x*)  and  yk  is  a  secant 
approximation  to  ZjVj^x*,  Xk)Zksk.  The  approach  also  has  the  advantage  that,  when 
n  —  t  is  small  relative  to  n,  the  Hessian  approximation  that  needs  to  be  stored  is  smaller. 
Reduced  Hessian  updating  methods  have  been  proposed  by  Murray  and  Wright  (1978), 
Gabay  (1982),  Gilbert  (1987),  Coleman  and  Conn  (1984),  and  Nocedal  and  Overton 
(1985).  For  the  last  two  approaches,  their  proposers  prove  that  if  x0  and  B0  are  good 
enough  approximations  to  the  solution  and  to  the  reduced  Hessian  of  the  Lagrangian, 
respectively,  then  the  iterates  converge  2-step  Q-superlinearly  to  the  solution.  These  two 
approaches  differ  primarily  in  the  choice  of  yk;  that  of  Coleman  and  Conn  is  more  costly 
in  function  evaluations,  but  is  probably  more  robust  than  that  of  Nocedal  and  Overton 
(which  is  closer  to  the  first  two  approaches  mentioned).  Actually,  Coleman  and  Conn 
consider  two  versions  of  their  algorithm;  here  we  are  referring  to  the  version  that  uses 
only  one  constraint  evaluation  in  the  step  computation.  We  also  note  that  Fontecilla 
(1988)  proposes  a  full  Hessian  method  analogous  to  the  algorithm  of  Coleman  and  Conn 
and  proves  a  similar  convergence  result. 

Most  of  these  methods  work  reasonably  well  in  most  cases,  but  none  of  them  is 


regarded  as  completely  satisfactory  in  theory  or  in  practice  (see  Powell  (1987)).  Note 
that  all  the  above  mentioned  analyses  either  assume  a  good  initial  approximation  to  the 


solution  and  to  the  Hessian  of  the  Lagrangian  at  the  solution,  or  they  assume  that  the 
iterates  converge  and  that  the  Hessian  approximations  are  bounded  in  some  way.  We 
regard  these  assumptions  as  undesirable  since  it  is  not  known  when  they  will  be  satisfied 
in  practice.  The  objective  of  this  work  is  to  develop  a  convergence  theory  for  reduced 
Hessian  successive  quadratic  programming  that  only  assumes  of  the  matrices  that  the 
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initial  one  is  positive  definite,  and  does  not  assume  that  the  iterates  converge.  Since 
we  are  making  no  assumptions  on  Bk  or  on  the  convergence  of  the  iterates,  there  is  no 
guarantee  that  x k  +  dk  is  closer  to  the  solution  z„  than  xk  is.  In  practice  a  line  search 
is  usually  relied  on  to  force  progress  towards  the  solution.  This  is  done  by  using  a  merit 
function  y?(z),  and  by  computing  the  steplength  ak  so  that  p(xk  +  akdk)  is  significantly 
less  than  ^>{xk). 

We  will  analyze  a  procedure  of  this  type  and  show  that,  under  certain  conditions, 
if  zj  is  within  a  neighborhood  of  z.  this  decrease  in  the  merit  function  will  force  {xk} 
to  converge  to  z.  R-linearly,  whereupon  known  results  will  imply  that  the  convergence 
is  superlinear.  Thus  our  work  will  be  somewhat  analogous  to  the  well  known  paper  of 
Powell  (1976)  on  the  convergence  of  the  BFGS  method  with  inexact  line  search  for  a 
convex  objective  function.  We  have  chosen  to  consider  reduced  Hessian  approaches  here 
primarily  because  the  issues  we  are  interested  in  are  simpler  to  deal  with  than  for  full 
Hessian  approaches.  Also  for  simplicity  we  have  chosen  to  analyze  an  updating  strategy 
like  that  of  Coleman  and  Conn,  but  many  of  our  results  can  probably  be  extended  to  the 
more  complex  Nocedal  and  Overton  strategy. 

The  algorithm  to  be  studied  is  defined  in  Section  2,  and  the  methods  for  updating 
Bk  and  for  performing  the  line  search  are  laid  out  precisely.  We  consider  two  merit 
functions,  the  lk  function  proposed  as  a  merit  function  in  Han  (1977),  and  the  Fletcher 
(1970),  (1973)  exact  penalty  function. 

In  Section  3  general  results  of  Byrd  and  Nocedal  (1987)  on  the  BFGS  update  are 
used  to  show  that,  if  an  adequate  line  search  is  done,  then  the  merit  function  is  decreased 
significantly  for  at  least  a  fraction  of  iterates.  This  fact  is  then  used  to  prove  a  somewhat 
weak  global  convergence  result.  The  effect  of  choice  of  the  weight  in  the  merit  function 
is  taken  into  consideration. 

In  Section  4  we  consider  the  local  behavior  of  the  algorithm  near  a  point  satisfying 
the  standard  strong  sufficiency  conditions.  We  prove  that,  once  the  algorithm  gets  close 
enough  to  such  a  point  it  will  converge  R-linearly.  The  convergence  results  here  and  in 
Section  3  are  somewhat  more  satisfactory  for  the  lk  merit  function  than  for  the  Fletcher 
function. 

In  Section  5  we  study  superlinear  convergence.  We  consider  the  effect  of  the  choice 
of  null  space  basis  Zk  on  convergence  rate,  and  look  for  conditions  under  which  the 
algorithm  takes  unit  steplenghts  near  the  solution.  This  is  not  a  problem  for  the  Fletcher 
function,  but  for  the  l\  function  the  algorithm  needs  to  be  modified.  We  consider  two 
modifications,  the  correction  step  and  the  watchdog  technique,  and  show  that  they  allow 
for  unit  steplenghts  near  the  solution,  which  ensures  a  two-step  Q-superlinear  rate  of 
convergence. 

Notation.  The  Lagrangian  function  will  be  defined  by 

L{x,\)  =  f{x)  +  \Tc(x),  (1.4) 

and  we  denote  the  reduced  Hessian  of  the  Lagrangian  by  G,  i.e. 

Gk  ~  Zk^].xL{xk,\k)Zk.  (1.5) 
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Throughout  the  paper  ||-||  denotes  the  I2  vector  norm  or  the  corresponding  induced  matrix 
norm.  When  using  the/i  or  /,»  norms  we  will  indicate  it  explicitly  by  writing  |j-||i  or  ||-||x. 
We  recall  that  the  / 1  and  norms  are  duals  of  each  other,  so  that  A Tc  <  {( Ajl^ljcd] .  A 
solution  of  the  problem  (1.1)  is  denoted  by  x and  we  let  ek  =  xk  —  x.. 


2.  Reduced  Hessian  Methods  with  Line  Search 

Now  we  describe  a  general  reduced  Hessian  SQP  algorithm  of  the  type  discussed  in 
§1.  We  denote  the  merit  function  by  and  its  directional  derivative  at  x  in  the  direction 
a.  bv  D^(x\d).  The  precise  form  of  <p  will  be  discussed  later. 

Algorithm  2.1 

The  constants  rj  £  (0,  j)  and  r,  t'  with  0  <  r  <  r'  <  1  are  given. 

(1)  Set  k  =  1  and  choose  a  starting  point  Xi  and  a  symmetric  and  positive  definite 
starting  matrix  B\. 

(2)  Compute  Zk  and  obtain  dk  by  solving  the  quadratic  program 

mmngZd+  ^dTZkBkZ{d 

subject  to  ck  +  A Jd  =  0.  (2.1) 

(3)  Set  ak  =  1. 

(4)  Test  the  line  search  condition 

v(xk  +  c*kdk)  <  v(xk)  +  r}QkD^(xk;dk).  (2.2) 

(5)  If  (2.2)  is  not  satisfied,  choose  a  new  a*  in  [ra^r'a/t]  and  go  to  (4);  otherwise  set 

Xk+i  =  Xk  +  o*d*.  (2.3) 


(6)  Compute 


&k  —  Zk  (*fc+l  ^fc)' 


(2.4) 


Vk  =  Zj[[VzL{xk  +  akhk ,  A k)  -  V  ZL( xk,  A*)],  (2.5) 


where  \k  is  chosen  so  that  (2.12)  is  satisfied.  If  sk  ^  0  update  Bk  using  the  BFGS 
formula 


Bk+ 1  —  Bk  — 


BkskslBk  ykyj 
s{Bksk  yksk 


(2.6) 


(7)  Set  k  :=  k  +  1,  and  go  to  (2). 


J 
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The  solution  to  subproblem  (2.1),  which  gives  the  step  direction,  may  be  expressed 


as 


dk  =  hk  +  Vk , 

(2.7) 

where 

hk  =  -ZkB^1  Zk  gk, 

(2.8) 

and 

Vk  =  -Ak[AkAk]~1ck, 

(2.9) 

give  an  orthogonal  decomposition  of  dk,  and  where  gk  stands  for  y(xfc), 
vk  is  in  the  range  space  of  Ak  and  may  be  regarded  as  a  minimum  norm 
the  equation  c(x)  =  0.  The  vector  hk  lies  in  the  null  space  of  A% ,  tends 

etc.  The  vector 
Newton  step  on 
to  move  toward 

a  stationary  point  of  the  Lagrangian  and,  to  first  order,  leaves  the  value  of  c  unchanged. 
Note  that  the  approximation  matrix  Bk  only  affects  the  null  space  component  h. 

The  procedure  for  choosing  a  new  value  of  a  in  step  (5)  is  not  specified  precisely  so  that 
our  analysis  can  cover  a  variety  of  line  search  strategies.  There  are  several  procedures, 
such  as  a  safeguarded  interpolator  line  search  algorithm  or  simple  multiplication  by  a 
constant,  that  would  give  a  new  a*  in  the  specified  interval.  Note  that  the  line  search 
always  reduces  the  steplength  and  thus  a*  <  1  for  all  k.  This  is  common  in  successive 
quadratic  programming  algorithms,  and  is  due  to  the  condition  c(ik)  +  A(xk)Tdk  =  0. 

In  the  algorithm,  Zk  refers  to  an  nx(n-t)  matrix  satisfying  A^Zk  =  0  and  Zj Zk  =  /. 
These  conditions  do  not  specify  Zk  uniquely,  and  the  iteration  does  depend  on  our  choice 
of  Zk.  It  turns  out,  however,  that  the  results  in  Sections  3  and  4  are  true  for  any  choice 
of  Zk ,  and  that  only  to  prove  superlinear  convergence  do  we  need  to  place  additional 
restrictions  on  Zk. 

Let  us  now  discuss  the  choice  of  the  vectors  Sk  and  yk  needed  in  step  (6).  Since  Bk  is 
meant  to  be  an  approximation  to  the  reduced  Hessian  of  the  Lagrangian  Zj ViIXL(xk,  A k  )Zu 
based  on  information  at  xjt  and  x*+i,  it  is  reasonable  to  define  sjt  by  (2.4),  or  equivalently 

by 

Sk  =  ctkZkhk,  (2.10) 

but  we  could  have  replaced  Zk  by  Zk+\  in  these  expressions.  The  choice  of  yk  is  less 
obvious.  The  formula  we  use  in  Algorithm  2.1  is  that  proposed  and  analyzed  by  Coleman 
and  Conn  (1984).  To  motivate  this  formula  for  yk  note  from  (2.10),  and  from  the  fact 
that  ZkZ^hk  =  hk,  that 

ZjV2IXL{xk^k)ZkSk  =  Zj[VlxL(xk^k)akhk} 

ss  Zj[VrL(x*  +  akhk.  A k)  -  VxL(x*,  A*)]. 

Since  we  want  to  impose  the  secant  condition  Bk+iSk  =  yk  it  is  natural  to  define  yk  by 
(2.5).  There  are  several  slight  variations  of  the  formula  for  yk  that  could  be  used.  For 
example  we  could  define 

Vk  =  Zk+l{VxL(Xk+l,Xk+l)  —  ^rL(Xfc+ 1  ~  Ctkhk,  Afc+i )], 
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i 


thereby  using  the  most  recent  information  available.  We  will  only  consider  the  definition 

(2.5) ,  but  the  results  of  this  paper  also  hold  for  several  of  these  variations. 

A  significantly  different  formula  for  yk  is 

i Ik  =  ^4+1  [^xL(£fe+i»  Afc+i )  —  VrL(xj;,  Afc+j )].  (2.11) 

Formulas  of  this  type  have  been  suggested  by  Murray  and  Wright  (1978),  Gabay  ( 1982). 
and  Nocedal  and  Overton  (1985).  An  advantage  of  using  (2.11)  is  that  it  requires  only 
one  evaluation  of  the  derivatives  of  /  and  c  per  iteration  as  opposed  to  two  evaluations 
for  (2.5).  However,  Nocedal  and  Overton  note  that  (2.11)  can  be  subject  to  instability  in 
some  cases,  and  in  their  analysis  they  stipulate  that  under  certain  conditions  the  update 
be  skipped.  In  this  paper  we  will  analyze  only  the  choice  (2.5),  and  leave  the  formulas 
like  (2.11),  whose  analysis  is  more  complicated,  for  subsequent  study. 

There  are  several  effective  ways  to  estimate  the  Lagrange  multiplier  in  the  Hessian  of 
the  Lagrangian.  We  require  only  that  A^  be  chosen  so  that 

||A*  -  A.||  <  ta||x*  -  x.||  (2.12) 

is  satisfied  for  some  constant  7*.  This  condition  is  satisfied  by  several  formulas  including 

(2.13) 

and 

Afc  =  -  [AlAk\~l[Algk~ck].  (2.14) 

Powell  (1976)  has  shown  that  the  BFGS  method  for  unconstrained  minimization  has 
strong  convergence  properties  if  yksk  >  0  for  all  k,  and  if  the  sequence  {ykyk/ yk  sk}  is 
uniformly  bounded  above.  In  this  paper  we  will  show  that  these  two  conditions  are  also 
crucial  in  the  analysis  of  Algorithm  2.1.  The  following  lemma  shows  that  the  definition 

(2.5)  of  yk  ensures  that  these  two  conditions  hold  near  the  solution. 

Lemma  2.1  Given  an  iterate  xk,  a  step  akhk  and  a  Lagrange  multiplier  estimate  A^, 
assume  that  there  exist  positive  constants  m,  M  such  that 

HMI2  <  wT  [zJV2a.Z,(i,A*)Zt]  w  < 

for  all  w  G  Rn-t,  and  for  all  x  in  the  line  segment  joining 


(2.16) 

tell’  <  „ 

(2.17) 

Vk  s* 

7 

M|M|2,  (2.15) 

xk  and  xk  +  akhk.  Then 


Proof:  If  we  define 


i 


Gk  =  Zl  [  V2xxL(xk  +  Takhk.\k)drZk. 

Jo 

then  we  have  from  (2.5) 

yk  =  GkSk ■  (2.18) 

Thus  (2.16)  and  (2.17)  can  be  shown  to  follow  from  (2.15). 

□ 

We  now  consider  some  merit  functions  to  be  used  in  step  (4)  of  the  algorithm.  The 
first  merit  function  used  in  a  successive  quadratic  programming  algorithm  was  the 
merit  function  (cf.  Han  (1976)) 


<M*)  =  /(i)  +  /i||c(i)(|i.  (2.19) 

Han  used  the  l\  norm  of  c(x ),  but  other  choice  of  norms  are  possible.  An  alternative  is 
the  differentiable  function  proposed  by  Fletcher  (1973).  It  is  given  by 

*•»(*)  =  /(*)  +  Kx)Tc(x)  +  ii/|jc(x)||2,  (2.20) 

where 

A(x)  =  -  [A(z)TA(i)]  1  A{x)Tg(x)  (2.21) 

is  the  least  squares  Lagrange  multiplier  estimate  at  x.  To  compute  the  derivative  of  this 
merit  function  requires  second  order  information,  due  to  the  term  A(x).  However  Powell 
and  Yuan  (1986)  describe  a  procedure  that  uses  finite  differences  to  approximate  these 
second  order  terms  with  no  extra  evaluation  of  A(x).  In  this  paper  we  will  assume,  for 
simplicity,  that  the  derivative  of  A(x)  is  computed  exactly. 

Boggs  and  Tolle  (1984)  propose  a  merit  function  similar  to  (2.20),  and  most  of  our 
results  for  the  Fletcher  function  can  be  extended  to  their  merit  function,  if  some  additional 
assumptions  are  made.  Other  merit  functions  have  been  proposed  by  di  Pillo  and  Grippo. 
and  by  Schittkowski  (see  Powell  (1987)  for  a  review),  but  they  will  not  be  studied  in  this 
paper. 

It  is  essential  that  the  step  generated  by  Algorithm  2.1  define  a  descent  direction 
for  the  merit  function  used,  i.e.  that  £V(x*;d*)  <  0.  Indeed,  in  order  to  establish  a 
linear  convergence  rate,  that  quantity  must  be  significantly  negative.  Therefore,  we  now 
calculate  these  directional  derivatives,  starting  with  the  merit  function.  Although  this 
merit  function  is  not  differentiable  everywhere,  it  does  always  have  a  one-sided  directional 
derivative,  and  for  the  direction  dk  generated  by  Algorithm  2.1,  this  takes  a  particularly 
simple  form,  as  we  now  show. 

From  Taylor’s  theorem  we  have 

O^Jik  +  adk)  -  <t>Uk(xk)  =  f(xk  + adk)  -  fk  +  Hk\]c(xk  + ocdk)h  -  Vk\\ckh 

<  agldk  +  n fcljcfc  +  QA[d*:||i  +  61a2||d*:i|2 
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for  some  positive  constant  61.  (Note  that  61  actually  depends  on  the  weight  pk-)  From 


(2.1)  we  have  that  A%dk  --  -Cfc,  and  therefore,  assuming  a  <  1,  we  have 

<t>Uk(ik  +  adk)  -  oPk(xk)  <  a  \gjdk  -  /iit||cfc||i]  +  a2&i||<4||2-  (2.22) 

Similarly,  we  obtain  the  lower  bound 

<t>Mk(xk  +  adk)  -  4>Mk(xk)  >  Oc  [. g%dk  -  /i*||c*||i]  -  a%\\dk\\2-  (2.23) 

Taking  limits  it  is  therefore  clear  that 

D<t>Pk{xk\  dk)  =  gl dk  -  Mfe||c*:||i.  (2.24) 

In  order  to  separate  out  the  effects  on  the  merit  function  of  the  null  space  and  range 
space  components  of  the  step  we  recall  the  decomposition  dk  =  hk  +  i >*,  given  by  (2.7)- 
(2.9).  By  (2.9),  we  have 

9k  n  =  Ajfcfc,  (2.25) 

where  A k  =  A(x^)  is  given  by  (2.21)  so  that 

D<t>„k{xk\  dk)  =  9khk-  Mfc||ct||i  4-  A  Jck.  (2.26) 


By  (2.8)  g^hk  =  ~gk  ZkBkx  Z%gk,  and  since  the  matrices  {Bk}  will  be  forced  to  be  pos¬ 
itive  definite,  this  term  is  always  less  than  or  equal  to  zero.  Therefore  to  ensure  that  dk 
is  a  descent  direction  for  <j>Uk  it  is  sufficient  to  require  that  pk  >  ||Afc||o©.  Such  a  condi¬ 
tion  is  very  common  when  using  merit  functions  with  sequential  quadratic  programming 
methods,  and  appears  for  example  in  the  global  analysis  of  Han  (1977).  If  the  sequence 
{Afc}  is  bounded,  then  a  sufficiently  large  p  exists  satisfying  p  >  iiAfc|joo  for  all  k.  Since, 
however,  this  value  is  not  known  in  advance,  at  each  step  the  weight  pk  >  |(A/cj|oc  should 
be  chosen  in  such  a  way  that  it  eventually  becomes  fixed.  One  way  to  do  this  is  to  choose 
Pk  at  each  iterate  as  follows: 

UL=f  if^-1  >  Moo  +  P  ,99-, 

\  Halloo  +  2p  Otherwise, 

where  p  is  some  positive  constant. 

From  now  on  we  will  assume  that  when  the  t\  merit  function  <t>Uk  is  used  in  Algorithm 
2.1,  the  weight  pk  is  chosen  by  (2.27).  Therefore,  for  any  Xk ,  D<t>Pk(xk;dk)  <  0.  unless 
ZkQk  —  d  and  c*  =  0,  which  can  occur  only  at  a  stationary  point  of  problem  (1.1). 

As  mentioned  above,  one  could  use  other  norms  than  the  t\  norm  in  this  merit 
function.  In  fact,  all  of  the  results  and  proofs  in  this  paper  involving  the  merit  function 
(2.19)  remain  valid  if  the  norm  is  replaced  with  the  tp  norm  for  p  £  [l,oc],  provided 
that  the  norm  in  (2.27)  and  elsewhere  is  replaced  with  the  dual  norm  lq,  where 
^  =  1.  However,  we  will  continue  to  write  t\  norm  for  simplicity. 


We  now  consider  Fletcher’s  merit  function  (2.20).  Since  this  function  is  differentiable 
we  have 

^$i>k(xk)  =  9k  +  A*A*  +  (A  *)Tc*  +  VkAkCk,  (2.28) 

where  A';.  is  the  t  x  n  matrix  whose  rows  are  the  gradients  of  the  Lagrange  multiplier 
estimates.  Thus,  using  (2.1)  and  (2.25)  we  have 

D^„k(xk;dk)  =  gkdk  -  tfck  +  cl\'kdk  ~  i/k\\ck\\2 

=  9khk  +  ckXkdk  ~  t'fclMI2.  (2.29) 

Again,  as  with  the  merit  function,  the  first  term  is  non-positive.  It  is  also  clear  that,  for 
any  k,  uk  can  be  chosen  large  enough  so  that  (2.29)  is  less  than  or  equal  to  zero.  However 
the  algorithm  for  choosing  p  is  more  complex  than  (2.27),  and  we  defer  discussion  of  this 
issue  to  the  next  section,  where  we  analyze  the  convergence  of  the  algorithm. 

3.  Global  Behavior  of  the  Algorithm 

We  now  consider  the  convergence  properties  of  the  reduced  Hessian  SQP  algorithm 
defined  in  Section  2.  We  will  show  that,  fora  fraction  of  the  steps,  significant  decrease  in 
the  merit  function  can  be  obtained,  and  that  under  appropriate  assumptions  this  implies 
global  convergence. 

Equations  (2.26)  and  (2.29)  indicate  that  the  direction  generated  by  the  algorithm 
is  a  descent  direction  for  the  two  merit  functions  if  pk  and  vk  are  sufficiently  large  and 
‘f  9k  hk  =  gJZkZjhk  <  0.  Therefore  the  null  space  component  hk  must  make  an  acute 
angle  with  the  projection  of  -gk  onto  the  null  space,  ~ZkZ][gk-  In  order  to  quantify 
the  decrease  in  the  merit  function  obtained  in  a  step  of  the  algorithm,  we  will  consider 
closely  this  angle,  which  is  defined  by 

-(ZkZjgk)7  hk 
COS0k  ~  \\ZkZ?gk\\\\hk\\  ’ 

=  ~9khk  (31) 

\\Zl9k\\\\hk\\' 

since  II ZkZjgk ||  =  \\Zjgk\\.  Therefore,  from  (2.26)  and  (2.29)  we  have 

Dd>Uk(xk-,dk)  =  -11^*1)11/1*11  cos 9k  -  MfclNU  +  A* c*,  (3.2) 

and 

DQ^ixkidk)  =  -||Zjff*|j|j/i*||cos0*  +  cjfA^d*  -  ^*||c*||2.  (3.3) 

From  these  relations  it  can  be  seen  that  for  hk  to  provide  significant  descent  we  must 
require  that  cos$k  not  be  too  close  to  zero  and  that  hk  not  be  too  small  in  norm.  Both 
these  quantities  depend  very  strongly  on  the  reduced  Hessian  approximation  B*.  By 
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equation  (2.8),  h*  is  computed  so  that  BkZjhk  =  and  so  by  (2.10)  we  have  that 

BkSk  =  -QkZ^gk-  Therefore  cosQk  can  also  be  written  as 


cos  9k  = 


sj  BkSk 

11**11  iiflfcSfcir 


(3.4) 


and  we  have  that 

IIM  _  INI 

\\Zlgk\\  11^*11 '  1  } 

The  following  theorem,  which  is  proved  by  Byrd  and  Nocedal  (1987),  establishes 
bounds  on  these  quantities  that  hold  for  a  fraction  of  the  iterates. 


Theorem  3.1  Let  { Bk }  be  generated  by  the  BFGS  formula  (2.6)  where,  for  all  k  >  1, 
Sk  0  and 

Vk9k 
9k*k 

II  y*  II 2 

Vk*k 

Then,  for  any  p  €  (0, 1),  there  exist  constants  (3\,  fa,  03  >  0  such  that,  for  any  k  >  1,  the 
relations 


>  m  >  0 

<  M. 


02< 


cos  9j 

IU?N1 

IN! 


>  0\ 
<  03 


hold  for  at  least  ["/>£]  values  of  j  €  [l,  it]. 


(3-6) 

(3.7) 


This  theorem,  which  is  basic  for  the  analysis  of  this  paper,  implies  that  a  fraction  p 
of  the  iterates  with  Sk  ^  0  are  such  that  the  null  space  component  hk  gives  a  significant 
reduction  in  the  merit  function.  Later  we  will  see  that  the  iterates  with  sjt  =  0  also 
contribute  significantly  to  the  decrease  in  the  merit  fucntion.  Since  it  will  be  useful  to 
refer  easily  to  these  two  classes  of  iterates,  we  will  assign  a  value  to  p  and  make  the 
following  definition. 

Definition  3.1  Let  p  of  Theorem  3.1  have  the  value  value  p=  |.  We  define  J  to  be  the 
set  of  iterates  for  which  (3.6)  and  (3. 7)  hold,  or  for  which  Sk  =  0.  We  will  call  J  the  set 
of  “good”  iterates. 

This  definition  and  Theorem  3.1  imply  that,  J  n  [l,fc]  contains  at  least  [|fc]  iterates. 

We  are  now  ready  to  analyze  the  global  behavior  of  the  algorithm.  We  use  the  term 
global  because  we  do  not  explicitly  assume  that  the  iterates  are  near  the  solution,  but 
only  make  the  following  assumptions. 


11 


Assumptions  3.1  The  sequence  {x*}  generated  by  the  algorithm  is  contained  1a  a 
convex  set  D  with  the  following  properties. 


(1)  The  functions  /  :  R"  — *  R.  and  c  :  Rn  — ►  Rf  and  their  first  and  second  derivatives 
are  uniformly  bounded  in  norm  over  D. 

(2)  The  matrix  A(x)  has  full  column  rank  for  all  x  6  D,  and  there  is  a  constant  70 
such  that 

||A(x)[A(x)T.4(x)]'l||  <  70  (3.8) 

for  all  x  €  D. 

(3)  For  all  k  >  1  for  which  si  ^  0  we  have 


vhk 

>  m  >  0 

(3.9) 

*k*k 

INI2 

VI 

(3.10) 

The  following  lemma  on  the  relation  between  ||/»||  and  || ZTg\\,  for  the  good  iterates, 
will  be  useful  in  deriving  bounds  on  the  directional  derivative  of  the  merit  functions  in 
the  SQP  direction.  This  lemma  does  not  depend  on  the  merit  function  used. 

Lemma  3.2  Suppose  that  the  iterates  {x*}  generated  by  Algorithm  2.1  satisfy  Assump¬ 
tions  3.1.  Then  for  any  j  6  J 

J3\\ZJ9j\\  <  INI  ^  fe\\Zj9j\U  (3.11) 

INI2  <  ^\\Zjgj\\2  +  7oIIcjII2<  (3.12) 

Proof:  Let  j  G  J,  and  first  assume  that  sj  ^  0.  From  (3.5)  and  (3.7),  we  have  that  for 
J€7 

1  <  INI  <  1 

A  "  Wf<h\\- &’ 

which  gives  (3.11).  Using  (3.11),  (2.9)  and  (3.8)  we  have 

INI2  =  INI2  +  INI2 

<  ^ll^Jffjll2  +  7oik;l|2- 

If  sj  =  0  then  Zjgj  =  hj  =  0  and  the  result  clearly  holds. 


(3.13) 


3.1  The  £i  Merit  Function 

We  now  establish  some  useful  results  about  the  behavior  of  Algorithm  2.1  with  the 
li  merit  function,  and  use  these  results  to  establish  a  global  convergence  theorem.  The 
following  lemma  shows  that  all  the  steps  dk  generated  by  Algorithm  2.1  define  descent 
directions  for  the  i\  merit  function,  and  that  a  significant  reduction  in  this  merit  function 
is  obtained  for  the  good  steps. 

Lemma  3.3  Let  the  iterates  {x*}  be  generated  by  Algorithm  2.1  using  the  t\  merit  func¬ 
tion  (2.19)  with  the  weights  chosen  so  that 

Pk  >  l|A(x*)||oo  +  P,  (3.14) 

for  all  k  >  0,  where  p  >  0.  Suppose  that  Assumptions  3.1  are  satisfied.  Then  for  all 
k  >  1 

D<t>»k(xk',dk)  <  -||Z*0fc||||Mcos0fc  -  p(|cjt||i,  (3.15) 

and  there  is  a  positive  constant  b2  such  that  for  all  j  £  J 

D+»Mfidi)  <  -b2  [| \Zj9j\\ 2  +  | Mr] .  (3.16) 

Moreover  for  any  value  p  there  is  a  positive  constant  7^  such  that  if  j  £  J  and  pj  =  p 
then 

Mxj)  -  M*j+ 1)  ^  [ll^Tff>ll2  +  IMIl]  •  (3.17) 

Proof:  From  (3.2)  and  (3.14)  it  follows  immediately  that  (3.15)  holds  for  all  k  >  1.  Now 
suppose  j  £  J.  We  can  apply  (3.11)  and  (3.6)  to  (3.15)  and  obtain  inequality  (3.16)  with 

62  =  min(0i//33,p)- 

To  consider  the  decrease  in  in  one  iteration,  for  j  £  J ,  note  that  the  line  search 
enforces  the  condition  (2.2), 

—  ^M;(*J+ 1)  —  —  T!ajD4>iij  [Xj\ dj).  (3.18) 

It  is  then  clear  from  (3.16)  that  (3.17)  holds,  provided  the  qj  can  be  bounded  from  below. 
Suppose  that  aj  <  1,  which  means  that  (2.2)  failed  for  a  steplength  a: 

<^M;(a'j  "h  ®^j)  ~  ^  (Zjj dj),  (3.19) 

where 

ra  <  cij  (3.20) 

(see  step  5  of  Algorithm  2.1).  From  (2.22)  and  (2.24)  we  have 

<MX>  +  ad >)  ~  *<*,(*>)  ^  &D4Mj(xy,dj)  +  d2ft,j|d_,)|2,  (3.21) 

where  6j  is  a  function  of  p.  Combining  (3.19)  and  (3.21)  we  have 

( V-  1  )aD4>lij(xj\dj)  <  o2M|d,||2  (3.22) 
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(3.23) 


From  (3.12)  and  the  fact  that  ||c/|  is  uniformly  bounded  above  we  have 

IKII’  <  hlllZ] rg,\\2  +  ||c,||i], 

for  63  =  max(l//?|,73supx6D||c(z)||).  Combining  (3.22),  (3.16)  and  (3.23) 


a  > 


(i  -  n)h 

6163 


(3.24) 


Thus  from  (3.20)  we  conclude  that  the  steplengths  a3  are  bounded  away  from  zero  for 
all  j  e  J,  and  (3.17)  holds  with  7'  =  rj62min{l,(l  -  ^7)62/(6163)}. 

□ 

Now  that  we  know  from  (3.15)  that  the  line  search  can  guarantee  decrease  in  d>  at 
every  iteration,  and  from  (3.17)  that  <j>  decreases  significantly  at  the  good  iterates,  we  can 
prove  a  global  convergence  result  for  the  l\  merit  function.  (Actually  (3.17)  is  stronger 
than  we  need  for  global  convergence  but  we  will  make  full  use  of  it  in  Section  4  to  prove 
local  R-linear  convergence). 


Theorem  3.4  Let  the  sequence  {i*}  be  generated  by  Algorithm  2.1  using  the  (1  merit 
function  with  weights  {/i*}  chosen  by  (2.27).  Suppose  that  Assumptions  3.1  are  satisfied. 
Then  the  weights  {pk}  are  constant  for  all  sufficiently  large  k  and  liminffc_00(||Zj’<7*||  + 

11**11)  =  0. 

Proof:  First  note  that  by  Assumptions  3.1  and  (2.21)  {||At||}  is  bounded.  Therefore, 
since  the  procedure  (2.27)  increases  Pk  hy  at  least  p  whenever  it  changes  the  weight,  it 
follows  that  there  is  an  index  k0  and  a  value  p  such  that  for  all  k  >  ha,  pk  =  p  >  ||A/t||  4-  p. 
Now  by  Assumption  3.1-3  there  is  a  set  J  of  good  iterates,  and  by  Lemma  3.3  and  the 
fact  that  <j>u(xk)  decreases  at  each  iterate,  we  have  that  for  k  >  k0, 

k 

=  ]£(*„(*>)  - 
;=fco 

—  ~  )) 

>  52  [)\Zjg}\)2  +  \\cM 

;€^n[fc0.*l 

By  Assumption  3.1-1  4>^(x)  is  bounded  below  for  all  x  €  D ,  so  the  sum  is  finite,  and  thus 
the  term  inside  the  square  brackets  converges  to  zero.  Therefore 

Urn  (,\\Zjgj\\  +  (|cj||i)  =  0.  (3.25) 


and  since,  by  Theorem  3.1,  J  is  infinite  the  theorem  follows. 


□ 
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Actually  this  result  could  have  been  proved  with  the  boundedness  of  |/|  and  ||c||  in 
Assumption  3.1  replaced  with  the  assumption  that  <t>Uk  is  bounded  below  over  D  for  some 
k,  but  the  analysis  would  have  been  somewhat  more  complicated. 


3.2  Fletcher’s  Merit  Function 

Now  we  consider  Algorithm  2.1  using  Fletcher’s  merit  function  (2.20).  Even  though 
the  analysis  is  similar  to  that  with  the  (\  merit  function,  we  will  be  forced  to  make  some 
additional  optimistic  assumptions  in  order  to  establish  convergence. 

Recall  the  directional  derivative  (3.3), 

D$vk(zk;dk)  =  -||2jgfc||||hfc||cos9fc  +  c^Vfcdfc  -  ^fc||cfc||2.  (3.26) 

In  this  case  the  weighty*  appears  to  be  playing  the  same  role  as  the  difference  (*i-|j  A*  |(oo ) 
does  in  (3.2).  However,  since  the  term  involving  the  derivative  of  A  appears  to  be  of 
unpredictable  sign,  y*  may  have  to  be  increased  to  ensure  that  the  descent  condition 
holds.  Considering  (3.26)  we  see  that  dk  is  a  descent  direction  if  and  only  if 

‘  iWi5  '  (  1 

(If  ||c*|j  =  0  we  obtain  a  strong  direction  of  descent  for  any  choice  of  y*,  and  the  analysis 
that  follows  becomes  very  simple.  We  therefore  assume  that  ||c*||  ^  0.)  Condition 
(3.27)  certainly  appears  more  complex  than  the  corresponding  condition  (3.14)  for  the 
l\  function.  Setting  that  issue  aside  for  the  moment,  we  now  show  that  if  we  choose  y* 
to  satisfy  a  slightly  stronger  condition  than  (3.27)  we  can  prove  a  result  analogous  to 
Lemma  3.3. 


Lemma  3.5  Let  the  iterates  {x*}  be  generated  by  Algorithm  S.l  using  Fletchers  merit 
function  (2.20)  where,  for  all  k  >  1,  the  weights  are  chosen  so  that 


Vk  > 


ckxkdk  +  \9khk 

INI2 


+  p  =  y*  +  p, 


(3.28) 


for  some  positive  constant  p.  Suppose  that  Assumptions  S.l  are  satisfied.  Then  for  all 
k  >  1  we  have  that 


D*»k{xk\dk)  <  ~  j||Z*V*l!H^*ll  cos 9k  ~  />lk*||2,  (3-29) 

and  there  exists  a  positive  constant  b4  such  that,  for  all  j  €  J, 

D*„,(z};d})  <  -b4  [\\Zjgtf  +  ||c,||2]  .  (3.30) 

Moreover  for  any  value  v  there  is  a  constant  7'  suck  that,  if  j  €  J  and  v;  =  y, 

~  <Mz>+i)  ^  'll  [ \\zj9j\\ 2  +  INI2]  • 


(3.31) 


Proof:  From  (2.29)  and  the  definition  of  uk 


D$Vk{xk;dk)  =  \gkhk  +  {uk  -  ^fc)licfc(|2,  (3.32) 


and  using  (3.28)  and  (3.1),  equation  (3.29)  follows.  Next,  note  that,  for  j  €  J ,  equation 
(3.30)  follows  from  (3.29)  using  (3.11),  and  (3.6). 

The  rest  of  the  proof  is  analogous  to  the  proof  of  Lemma  3.3.  Since  the  line  search 
enforces  the  condition  (2.2),  it  is  clear  from  (3.30)  that  (3.31)  holds,  provided  the  a}  can 
be  bounded  from  below.  As  in  the  proof  of  Lemma  3.3  we  see  that  if  ctj  <  1,  we  have 
(3.19)  and  (3.20)  for  the  Fletcher  function.  Using  Taylor’s  theorem  we  see  that  (3.21) 
also  holds  in  this  case,  except  that  6i  now  stands  for  a  constant  different  form  the  one 
defined  before  (2.22).  We  therefore  obtain  (3.22).  From  (3.12)  we  have 

11411*  <  b5  [\\Zjgtf  +  |jCj||2]  ,  (3.33) 

for  some  positive  constant  65.  We  see,  from  (3.22),  (3.30),  and  (3.33),  that 


a  > 


(1  ~  V)i>4 

Ms 


(3.34) 


Thus  from  (3.20)  we  conclude  that  the  steplengths  are  bounded  away  from  zero  for 
all  j  €  J- 

□ 

Note  that  (3.28)  gives  a  computable  value,  and  i/k  could  be  increased  if  necessary,  at 
each  iteration,  to  satisfy  (3.28).  In  order  to  use  Lemma  3.5  to  prove  any  convergence  result 
we  must  know  that  eventually  vk  becomes  fixed  while  still  satisfying  (3.28).  Therefore, 
by  analogy  with  (2.27),  we  suggest  choosing  vk  at  each  iteration  by 


f  if  j/*_i  >vk  +  p 

)  Fk  +  2p  otherwise, 


(3.35) 


where  p  is  some  positive  constant. 

Note  that  the  sequence  {i/k}  will  diverge  if  {i^}  is  unbounded,  and  in  that  case 
Lemma  3.5  cannot  be  used  to  prove  convergence.  Thus  it  is  essential  that  the  sequence 
Vk  be  bounded.  However,  in  contrast  to  jjAfcjj,  the  quantity  vk  depends  on  <4,  and  thus 
Bk ,  as  well  as  on  xk,  making  its  boundedness  a  difficult  question.  The  most  we  are  able 
to  say  about  the  boundedness  of  Vk  is  contained  in  the  following  result. 


Lemma  S.8  Suppose  that  the  iterates  {xk}  are  generated  by  Algorithm  2. 1  using  Fletcher's 
merit  function  (2.20)  and  that  Assumptions  3.1  are  satisfied.  Then,  there  is  a  constant 
such  that  for  any  k, 


(3.36) 


and  the  sequence  {t/j}  is  thus  uniformly  bounded  above  for  all  j  £  J. 
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Proof:  By  the  geometric/arithmetic  mean  inequality, 


clKhk  = 


\9khk\ 

1  ( cl\'khk )2 


since  g^hk  <  0.  Therefore  by  (3.28),  (2.8)-(2.10),  and  (3.8) 


Vk  < 

< 


< 


(cfX^hk)3 


+  c*A  'kvk 


2|sjhfc| 

(INI  l|Aj||  \\sk\\)> 

2s\BkSk 

ll^fcll2  sIsk 

,T 

s* 


1 


IM|2 

+  INI  11*111  IMI 


1 

IN|2 


Referring  to  (2.21)  we  note  that  by  Assumptions  3.1-1  and  3.1-2,  ||A*||  is  uniformly 
bounded  for  all  x *.  By  (3.6)  and  (3.7)  it  follows  that  {J7j}  is  bounded  for  all  j  &  J.  □ 

This  result  is  not  as  strong  as  one  might  hope  for,  since  we  are  not  able  to  bound 
the  Rayleigh  quotient  sfBfcSfc/s^Sfc  away  form  zero  for  all  k.  Therefore  we  cannot  rule 
out  the  possibility  that  a  subsequence  of  these  Rayleigh  quotients  goes  to  zero  in  such  a 
way  that  {i/*}  must  diverge  to  yield  a  descent  direction  at  each  iteration.  It  is  not  clear 
whether  this  is  likely  to  be  a  problem  in  practice  or  not.  It  is  interesting  to  note  that 
Powell  and  Yuan  (1986)  avoid  these  difficulties,  when  analyzing  the  Fletcher  function,  by 
assuming  a  priori  that  ||Bfc||  and  ||Bfc  Ml  are  bounded.  Under  these  conditions  they  show 
that,  if  i/fc  is  chosen  by  a  procedure  analogous  to  (3.28),  it  will  be  bounded. 

Therefore,  to  prove  a  global  convergence  theorem  analogous  to  Theorem  3.4  we  will 
simply  make  the  optimistic  assumption  that  the  sequence  {17k}  is  bounded. 


Theorem  3.7  Let  the  sequence  {ifc}  be  generated  by  Algorithm  2.1  using  the  Fletcher 
merit  function  with  the  weights  Vk  chosen  by  (3.35).  Suppose  that  Assumptions  3.1  are 
satisfied  and  that  the  sequence  {i7*}  defined  by  (3.28)  is  bounded  above  for  all  k.  Then 
Vk  is  eventually  constant  and  liminf*_a0(||Zjpfc||  +  |jct||)  =  0. 


Proof:  Since  the  sequence  {Fit}  is  bounded,  the  procedure  (3.35)  guarantees  that  vk  will 
eventually  be  constant.  By  Assumptions  3.1,  is  bounded  below  for  all  x  €  D.  Then, 
using  Lemma  3.5,  the  result  follows  by  the  same  argument  as  in  the  proof  of  Theorem  I 


3.4. 


I 


s 


4.  Local  Convergence 

Now  we  consider  a  local  minimizer  x«  that  satisfies  the  second  order  sufficiency  conditions, 
and  show  that  the  algorithm  is  locally  and  R-linearly  convergent  to  it.  We  will  make 
the  following  assumptions  in  a  neighborhood  of  x„,  and  for  the  rest  of  the  paper,  these 
replace  Assumptions  3.1. 

Assumptions  4.1  The  point  x.  is  a  local  minimizer  for  problem  (1.1)  at  which  that  the 
following  conditions  hold. 

(1)  The  functions  /  :  Rn  — < •  R,  and  c  :  Rn  — ►  R‘  are  three  times  continuously  differen¬ 
tiable  in  a  neighborhood  of  x.. 

(2)  The  matrix  A(x.)  has  full  column  rank.  This  implies  that  x.  is  a  Karush-Kuhn- 
Tucker  point  of  (1.1),  i.e.  there  exists  a  vector  A.  6  R‘  such  that 

VxL(x„  A.)  =  g(x.)  +  .4(x.)A.  =  0. 

(3)  For  all  to  €  Rn-t,  to  jz  0,  we  have  wTG.w  >  0. 

Note  that  (1)  and  (2)  imply  that  there  are  constants  70, 7L  such  that,  for  all  x  near  x.. 

||A(x)[A(x)rA(x)]-1||  <  70,  (4.1) 

and  for  all  x  and  z  near  x«, 

||A(x)  -  A(x)||  <  7i||x  -  x||,  (4.2) 

where  A(x)  is  given  by  (2.21).  Also,  (1)  and  (3)  imply  that  for  all  (x.  A)  sufficiently  near 
(x.,A„),  and  for  all  to  6  Rn-t, 

m||u;j|2  <  wtG(x,X)w  <  A/||to||2,  (4.3) 

for  some  positive  constants  m,M.  The  condition  /,  c  G  C3  is  only  needed  for  Fletcher’s 
function;  for  the  l\  merit  function  it  suffices  to  assume  that  /,  c  6  C2  and  that  their 
Hessians  are  Lipschitz  continuous  near  x.. 

We  need  to  establish  some  results  about  such  a  local  minimizer  and  its  relationship 
to  the  merit  functions.  First  we  note  that,  near  x«,  the  quantities  c(x)  and  Z(x)Tg(x) 
may  be  regarded  as  a  measure  of  the  error  at  x.  This  result  is  not  new  (see  e.g.  Powell 
( 1978)),  but  we  give  a  proof  for  the  sake  of  completeness.  We  recall  that  Z(x)  stands  for 
any  orthogonal  matrix  with  the  property  A(x)TZ(x)  =  0. 

Lemma  4.1  If  Assumptions  f.l  hold,  then  for  all  x  sufficiently  near  x. 

7ill*  “  *.||  <  ||c(*)||  +  \\Z{x)T g(x)\\  <  72(|x  -  x.||,  (4.4) 

for  some  positive  constants 


18 


J 


Proof:  Define  the  function  H  :  Rrl+t  —  Rn+<  by 


H(x,  A) 


VMx,  A) 

c(x) 


Then  H{x.,X.)  =  0,  and 

H'(x.,X.) 


VlxL(x.,X.)  A(x.) 
A(x.)T  0 


We  note  that  H'(x.,X.)  is  nonsingular,  for  if  H'(x„  X.)(uT,vT)r  =  0  for  some  u  €  Rn 
and  some  v  £  R‘,  then 


V2xI(i.,  X.)u  +  A(x.)v  =  0 
A(x.)tu  =  0. 


(4.5) 

(4.6) 


Thus  uT^lxL(x.,  X.)u  =  0,  and  by  (4.6)  and  Assumption  4.1-3  this  implies  that  u  =  0. 
Then,  since  A(x.)  has  full  rank,  (4.5)  implies  v  =  0.  Therefore  /?'(x,,A.)  is  nonsingular. 

Let  ||  •  ||e  denote  the  norm  defined  by  ||(ur,rr)T||e  =  ||u||  +  ||u||,  for  vectors  in  Rn+(. 
and  by  the  corresponding  induced  matrix  norm,  for  (n  +  t)  X  (n  +  t)  matrices.  The 
differentiability  of  H  at  (x.,A.)  implies  that  for  any  c  >  0, 


H(x,X)~  H'(x.,X.) 


x  -  xm 
A  -  A. 


<c(||x-x.||-H||A-A.||), 


for  all  (x,A)  sufficiently  close  to  (x«,A.).  Since  H'(x.,X,)  is  nonsingular,  if  c  is  taken 
sufficiently  small  it  follows  that 


7l(||x  -  x.||  +  || A  -  A, ||)  <  ||f?(x,  A)||e  <  75(11*  -  *.||  +  IIA  -  A. ||),  (4.7) 

where  i'2  -  ||ff'(x.,  A.)|je  +  e  and  7i  =  l/\\H '(x.,  A.)“l||e  -  e.  If  we  set  A  =  A(x), 
the  least  squares  multiplier,  in  (4.7)  then  since  VrL(x,A(x))  =  Z{x)Z(x)T  g(x),  the  left 
inequality  in  (4.4)  follows  immediately,  and  the  right  inequality  follows  from  (4.2)  if  we 
let  72  =  75(1  +  1l)- 

□ 

Now  we  show  that,  for  a  fixed  weight,  either  merit  function  may  also  be  regarded  as 
a  measure  of  the  error. 


Lemma  4.2  Suppose  that  Assumptions  4-1  hold  at  x,.  Then  for  any  p  >  HA.K^  there 
exist  constants  73  and  74,  such  that  for  all  ||x  -  x.||  sufficiently  small 

73||*  ~  *.||2  <  4>»(x)  -  0ji(x.)  <  74  [||Z(x)rg(x)||2  +  110(1)11!]  .  (4.S) 

Furthermore,  for  any  v  sufficiently  large  there  are  constants  75  and  such  that  for  all 
||i  -  x.||  sufficiently  small 

75||x  -  x.||2  <  $u(x)  -  $„(x.)  <  76  [||Z(z)T0(*)||2  +  l|c(i)||2]  •  (4.9) 
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Proof:  First  we  consider  the  Fletcher  merit  function,  which  by  Assumptions  4.1  is  at 
least  twice  continuously  differentiable  near  x..  We  have 

V$„(x)  =  g(x)  +  .4(x)[A(x)  +  uc(x )]  +  A'(x)rc(x) 

V2*„(x.)  =  V2I/,(x.,  A.)  +  A.A'(x.)  +  A'(x.)rA.r  +  vAmA[. 

By  Lemma  4.1,  and  since  V$[/(x«)  =  0,  we  have  that  for  any  c  >  0  there  is  a  constant 
76  such  that 

i(x)-i(x.)  <  i(|jV2$1/(x.)||  +  c)||x-x.||2 

<  76[||2(x)7'p(x)||2  +  ||c(x)||2], 

for  all  x  sufficiently  near  x„. 

To  establish  the  left  inequality  we  define 

G  =  V2rL(x.,  A.)  +  A.A'(x.)  +  A'(x.)rAr, 

so  that  G  +  vAmA?  =  V2$„(x.).  Note  that  ZjGZ,  is  positive  definite.  We  now  show 
that  G  +  i/A.Aj  is  positive  definite  for  u  sufficiently  large. 

Let  K  be  an  n  x  t  matrix  with  full  column  rank  such  that  ZjGK  =  0.  The  span  of 
K  could  be  considered  as  a  subspace  that  is  G  conjugate  to  the  span  of  Z..  Note  that 
the  t  x  t  matrix  A^K  is  nonsingular,  since  if  A*Kv  =  0  for  some  v  G  R*  then  K v  =  Z.w 
for  some  w  G  Rn-t.  But  then  ZjGZ.w  =  Z.GKv  =  0,  which  implies  that  w  -  0,  and 
so  v  =  0. 

Now  consider  the  n  x  n  matrix 


zTgz.  o 

0  KtGK  +  uKtA.aIK 


(4.10) 


The  matrix  on  the  right  hand  side  is  positive  definite  if  v  is  greater  than  the  smallest 
eigenvalue  of  (KtA.)~1KtGK(AJK)~1.  In  this  case,  since  the  product  of  the  three 
matrices  on  the  left  side  is  nonsingular,  the  matrix  [Z.  K]  must  be  nonsingular,  and  thus 
G  +  t 'A.A?  =  V2$„(x.)  is  positive  definite  for  such  u. 

Since  V2$k(x)  is  continuous,  there  is  a  constant  75  >  0  such  that  for  all  x  in  some 
neighborhood  of  x»,  all  eigenvalues  of  V2$„(x)  are  greater  than  275.  Therefore,  since 
V$„(x.)  =  0, 

$„(*)  -  <M*.)  >  7s||*  ~  x.\\2. 

We  now  treat  the  l\  merit  function  with  some  fixed  value  of  n  >  || A.Hoo.  Consider  a 
neighborhood  N  of  x.  over  which  (4.9)  holds  for  some  1/,  and  such  that  /j,  -  (( A(x  JKoo  > 
-  llA.Hoo],  and  ||c(x)||  <  ^[/x  -  |j A.JJoo]  for  all  x  G  N .  Then  we  have  that  for  x  G  -V 

d>4(x)  =  #„(x)-  A(x)rc(x)-  ^||c(x)||2  +  p||c(x)||1 

>  *„(*)  +  n  -  llACxjIloo  -  ~t/j|c(x)||  j | c(x ) | j  1 

>  *„(*)  +  j[|t-||A.||oo]||c(x)||1. 
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Since  c>M(x.)  =  $t,(x.)  the  left  inequality  of  (4.8)  follows  from  (4.9)  with  75  =  73.  Now 


Ou(x)  <  L(x,\.)  +  (p  +  ||A.((oo)||c(x)||1 

<  L(xm,  A.)  +  ||V2jI(j.,A.)|||(i  -  x.||2  +  (p  +  ||A.||0C)|(<r(x)(|1. 

Since  L(x«.  A.)  =  <pM(x«),  the  right  inequality  follows  from  (4.4),  and  from  the  bounded¬ 
ness  of  ||c(x)||  near  x.. 

C 

A  consequence  of  this  lemma  is  that,  for  a  sufficiently  large  value  of  the  weight,  either 
merit  function  will  have  a  strong  local  minimizer  at  x„.  We  would  like  to  use  the  descent 
property  of  Algorithm  2.1  to  show  that  x.  is  a  point  of  attraction  of  the  algorithm.  To 
do  this  we  make  the  following  assumption  on  the  line  search. 

Assumption  4.2  The  line  search  has  the  property  that,  for  xk  sufficiently  close  to  x„, 
Vt((l  -  9)xk  +  8xk+ 1)  <  <p(xfc)  for  all  9  6  [0, 1]. 

This  assumption  is  rather  similar  to,  but  weaker  than,  the  Curry-Altman  condition,  and 
similarly,  there  is  no  practical  line  search  algorithm  which  can  guarantee  it  absolutely. 
However,  it  seems  unlikely  that  it  is  violated  close  to  x..  We  should  note  that  an  as¬ 
sumption  of  this  type  is  needed  also  in  the  context  of  unconstrained  optimization;  see  for 
example  §7  of  Byrd,  Nocedal  and  Yuan  (1987). 

Now  we  consider  Algorithm  2.1  using  the  /i  merit  function  and  show  that  if  an  iterate 
xjt  gets  close  enough  to  x«,  with  k  large  enough,  the  sequence  will  stay  close  to  x.  and 
converge  to  x,  R-linearly. 

Theorem  4.3  Let  {x*}  be  generated  by  Algorithm  2.1  using  the  merit  function  (2-19). 
with  pk  chosen  by  (2.27).  Suppose  that  x.  satisfies  Assumptions  4.1 ,  that  Assumption 
4.2  holds,  and  that  {||A(ifc)||}  is  bounded.  Then  the  weight  has  a  fixed  value  p  for  all 
sufficiently  large  k,  and  there  is  a  neighborhood  of  x.  such  that  if  any  iterate  xkg  falls  in 
that  neighborhood,  with  pko  =  p,  then  {x*}  — ►  x«.  Furthermore 

t^xic+i)  -  <Mx.)  <  rl'-fco[<Mxfco)  ~  <t>»(x,)],  k  >  k0  (4.11) 

for  some  constant  r  <  1,  and 

OO 

||x*  -  x.|j  <  x>.  (4.12) 

fe=i 

Proof:  By  Assumptions  4.1  there  exists  >  0  such  that,  for  all  x  in  the  neighborhood 
Ni  =  {x  :  ||x  -  x.||  <  ^ j } ,  Assumptions  3.1-1  and  3.1-2  are  satisfied,  and 

l|A(^)||oo  +  p  >  ( I A» (Joo .  (4.13) 

Also,  by  choosing  small  enough  we  can  guarantee  (as  in  Lemma  2.1)  that,  if  xk  and 
iic+i  are  in  N\  and  Xk  satisfies  (2.12),  then  Assumption  3.1-3  is  satisfied. 
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Now,  since  {H A(a:fc)||O0}  is  bounded,  the  procedure  (2.27)  implies  that  for  all  fc  greater 
than  some  value  fc,  is  fixed  at  some  value  fx.  By  (4.13)  and  (2.27),  if  an  iterate  x*, 
with  k  >  k,  occurs  in  N\  then  it  must  be  that  n  >  For  such  n  it  follows  from 

Lemma  4.2  that  the  function  </>M  has  a  strict  local  minimizer  at  x..  Therefore,  there 
exists  f>2  €  (0, 6j]  such  that  if  ||x  -  x.j|  <  62,  the  connected  component  of  the  level  set 
{r  :  d>M(z)  <  d>M(x)}  containing  x.  is  a  subset  of  Arj  over  which  equation  (4.8)  holds. 

Now  Assumption  4.2  implies  that  if  for  some  ko  >  fc ,  ||ifc0  -  x,j|  <  62,  then  Xk  €  A'i 
for  all  fc  >  fco,  since  </>M  is  decreased  at  each  step. 

Thus  we  have  that  Assumptions  3.1  hold  on  A^  for  k  >  fco ,  2nd  we  may  identify  A'i 
with  the  set  D  of  those  assumptions,  so  that  all  of  the  results  in  Subsection  3.1  for  the 
merit  function  hold  for  fc  >  fco-  Therefore,  if  Bk0  is  positive  definite  £*  remains  positive 
definite  for  all  subsequent  iterates,  and  by  Theorem  3.1  there  is  a  set  of  good  iterates  J . 
From  Lemma  3.3  and  Lemma  4.2  we  have,  for  all  j  €  J,  j  >  fco, 

<t>Axj)  ~  ^  ~  ~  (4.14) 

/  4 

and  so 

<M*/+ 1)  -  [K(xj)  - 

where  rf  s  1  -  ^  <  1.  From  Lemma  3.3  we  see  that  ^^{Xk+i)  <  ^(x*)  for  all  fc,  and 
since  J  n  [fc0,  fc]  has  at  least  f5(fc  -  fco)/6]  elements,  we  have  for  all  fc  >  fco 

<M*fc+ 1)  ~  Mx>)  <  rk~k°  [<M*fco)  ~  ■ 

From  this  relation  and  (4.8)  we  obtain 

Y  II**  -  *.ll  <  Y  Ik*  -  *-ll  +  (73)"l/2  Y  <M*-)]1/2 

k= 1  fc=l  fc=*0 

fco  r  t  1 1/2  00 

<  Y  Ik*  -  X*H  +  0+i)  -  Y  (rl/2)* 

*“i  L73  J  k=k 0 

<  00. 

□ 

It  is  possible  to  strengthen  this  result  and  show  that  there  is  a  neighborhood  of  x.  such 
that  if  any  iterate  lands  in  the  neighborhood,  the  sequence  converges  to  x.  R-linearly. 
However  the  analysis  of  this  result  is  much  more  complex. 

Note  that  the  local  result  of  Theorem  4.3  fits  together  well  with  the  global  analysis 
of  Section  3.1.  If  Assumptions  3.1  hold  for  a  set  D  which  is  in  addition  compact  then  by 
Theorem  3.4  the  sequence  {xfc}  will  have  a  cluster  point  that  is  a  stationary  point.  If  this 
stationary  point  satisfies  Assumptions  4.1  then  Theorem  4.3  implies  that  the  sequence 
will  converge  to  it  R-linearly. 

For  Fletcher’s  merit  function  one  cannot  show  such  a  strong  result  since,  as  was 
discussed  in  Section  3.2,  there  appear  to  be  no  assumptions  on  the  problem  that  will 
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guarantee  {t**}  is  bounded.  However,  if  we  make  the  optimistic  assumption  that  the 
sequence  {27*}  defined  by  (3.28)  is  uniformly  bounded,  we  may  prove  an  R-linear  conver¬ 
gence  result. 

Theorem  4.4  Let  {x*}  be  generated  by  Algorithm  2.1  using  the  Fletcher  merit  function 
(2.20),  with  Ok  chosen  by  (3.35).  Suppose  that  x.  saiisfies  Assumptions  4.1.  that  ,4s- 
sumption  4-2  holds ,  that  the  sequence  {17*}  defined  by  (3.28)  is  bounded,  and  that  i '*  is 
eventually  large  enough  to  satisfy  the  conditions  of  Lemma  4-2.  Then  the  weight  has  a 
fixed  value  o  for  all  sufficiently  large  k,  and  there  is  a  neighborhood  of  x.  such  that  if  any 
iterate  x*0  falls  in  that  neighborhood,  with  i/*0  =  o,  then  {x*}  — ►  x,.  Furthermore 

$„(*fc+i)  -  $</(**)  <  rfc-fco[$1/(xfc0)  -  $„(*.)],  k  >  fc0  ( 4.15) 

for  some  constant  r  <  1,  and 

OO 

5Z  ii**  —  x-ii  <  °°-  (4-16) 

*=i 

Proof:  By  the  assumed  boundedness  of  {£7*},  the  procedure  (3.35)  guarantees  that  the 
weight  Uk  is  equal  to  some  fixed  value  v  for  all  k  sufficiently  large.  Since  we  also  assume 
that  eventually  i/*  becomes  large  enough  that  (4.9)  holds  for  some  constants  75  and  76, 
then  Assumption  4.2  implies  the  sequence  eventually  stays  in  a  neighborhood  in  which 
Assumptions  3.1  hold.  At  this  point  Lemma  3.5  and  Lemma  4.2  imply  that 

$„(xk)  -  <Mx*+i)  >  —[$„(**)  -  $„(*.)]•  (4.17) 

76 

This  expression  has  the  same  form  as  equation  (4.14)  in  the  proof  of  Theorem  4.3.  and 
the  result  follows  by  the  same  argument,  using  equation  (4.9)  in  place  of  (4.8). 

□ 

It  is  interesting  to  note  that,  once  R-linear  convergence  has  been  established,  it  follows 
that  ||fl*||  and  ||flt '||  are  uniformly  bounded  (we  prove  this  later  in  Theorem  5.1).  Then, 
by  Lemma  3.6  we  have  that  Fk  is  bounded.  However,  we  know  of  no  way  to  establish 
the  boundedness  of  V>k  a  priori ,  and  thus  give  a  proof  of  R-linear  convergence  of  the 
algorithm  using  the  Fletcher  function  without  making  such  optimistic  assumptions. 

5.  Superlinear  Convergence 

We  have  shown  in  §4  that  Algorithm  2.1  is  R-linearly  convergent.  We  now  investi¬ 
gate  whether  superlinear  convergence  occurs,  under  the  assumptions  of  §4.  In  §5.1  we 
discuss  the  relevant  properties  of  the  null  space  basis  and  give  an  attainable  condition 
which,  as  we  show  in  §5.2,  implies  a  consistency  property  of  f?*  yielding  two-step  super- 
linear  convergence,  if  steplengths  of  one  are  eventually  taken  at  every  iteration.  For  the 
Fletcher  function  this  implies  superlinear  convergence  of  Algorithm  2.1.  as  we  show  in 


§5.3.  However  with  the  l\  function  steplengths  of  one  may  be  impossible  even  very  close 
to  the  solution.  In  §5.4-5  we  consider  two  modified  versions  of  Algorithm  2.1  and  show 
that  they  both  overcome  this  difficulty  and  yield  two-step  superlinear  convergence. 


5.1  Choice  of  null  space  basis. 

The  results  of  §4  only  require  of  the  matrix  Zjt  that  its  columns  form  an  orthonormal 
basis  for  the  null  space  of  Aj,  i.e.  that  AjZk  =  0,  and  ZjZk  =  I.  However,  this  does 
not  completely  specify  Z^,  and  if  the  choice  of  null  space  basis  changes  too  much  from 
one  iterate  to  the  next,  superlinear  convergence  can  be  impeded.  Byrd  and  Schnabel 
(1986)  point  out  that  any  algorithm  that  chooses  Zk  as  a  function  of  A(xk)  alone  will 
have  discontinuities  at  some  points.  Coleman  and  Sorensen  (1984)  and  Gill,  Murray. 
Saunders,  Stewart  and  Wright  (1985)  consider  this  issue  and  suggest  several  procedures 
for  computing  Zk,  based  in  part  on  information  at  previous  iterates,  which  guarantee 
that  Z  varies  smoothly. 

The  approach  of  Coleman  and  Sorensen  is  to  obtain  Zk  by  computing  a  QR  factoriza¬ 
tion  of  Ak ,  in  which  the  inherent  arbitrary  sign  choices  in  the  factorization  algorithm  are 
made,  if  Ak  is  sufficiently  close  to  Afc_i ,  the  same  way  as  they  were  done  in  computing 
Zk-\  from  Ak-\.  If  {x*}  — 1 -  x,,  then  for  k  sufficiently  large  all  the  matrices  Ak  will  be 
close  enough  together  that  the  same  sign  choices  will  be  made  at  each  step.  Therefore,  for 
the  rest  of  the  sequence  we  have  Z*  =  z(Ak)  where  z  is  a  smooth  function  of  n  x  (n  -  t) 
matrices  in  a  neighborhood  of  A(x.).  This  implies  that  there  a  constant  a.  such  that 
|| Zk  -  x(x.)||  <  a.||x*  -  x„||. 

Gill,  Murray,  Saunders,  Stewart  and  Wright  (1985)  propose  applying  the  orthogonal 
factor  of  the  QR  factorization  of  Ak- 1  to  .4*,  and  then  computing  the  QR  factorization 
of  Ql-iAk  to  get  Qk  and  thus  Zk-  They  show  that  with  this  method 

||Z*+i  -  Zjfe||  <  fi||x*+i  -  xjt||, 

for  some  constant  a.  If  we  consider  the  null  space  bases  at  two  iterates  Xk  and  x},  with 
j  <  k,  we  have 


||Zfc  —  Zy  ||  <  £||Zi+1-Zt|| 

*=7 

fc-i 

<  ®  ^2  l|x»+t  “  X , I f - 

«=J 

If  the  sequence  {x*}  converges  R-linearly,  then  the  sum  ||*«+i  _  *»||  is  finite.  There¬ 
fore,  we  must  have  that  ||Zfe  —  Z;||  — ►  0  as  j  and  k  go  to  infinity.  This  means  that  {Zk} 
is  a  Cauchy  sequence,  and  aust  thus  converge  to  some  matrix  Z.,  which  by  continuity 
satisfies  A(x.)rZ.  =  0.  Therefore  for  the  Gill,  Murray,  Saunders,  Stewart  and  Wright 
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procedure,  as  well  as  for  the  Coleman  and  Sorensen  procedure,  there  is  a  constant  a. 
such  that  for  all  k 

\\Zk  -  Z.\\  <  a.\\xk  -  x.\\,  (5.1) 

where  Z.  is  a  particular  null  space  basis  for  ,4(x,).  As  we  shall  show  the  condition  (5.1 )  is 
all  that  is  required  of  the  null  space  basis  to  give  superlinear  convergence  of  the  reduced 
Hessian  algorithm. 


5.2  Consistency  of  the  Matrix  Approximation 

Since  Algorithm  2.1  approximates  only  the  reduced  Hessian  G*,  one  cannot  expect 
it  to  be  1-step  Q-superlinearly  convergent.  (See  the  examples  of  Byrd  (1985)  and  Yuan 
(1985)).  However,  results  of  Powell  (1978)  show  that  if  {x*}  —►  x.,  if  a*  =  1  at  each 
step,  and  if  the  matrices  Bk  satisfy 


5  ll( Bk  -  G.)jfc|| 

||*fc+i  “  *k\\ 


0, 


then  Algorithm  2.1  is  2-step  superlinearly  convergent,  i.e. 

l|j*+2  -  J«ll  _  Q 


(5.2) 


(5.3) 


In  fact,  Coleman  and  Conn  (1984)  prove  that  Algorithm  2.1,  using  the  DFP  update, 
satisfies  (5.2).  Their  arguments  are  based  on  the  theory  of  Dennis  and  More  (1977)  and. 
with  some  changes,  apply  to  the  BFGS  method  also.  However,  it  is  also  possible  to  obtain 
(5.2)  using  the  techniques  of  Byrd  and  Nocedal  (1987),  as  we  now  show. 


Theorem  5.1  Suppose  that  Assumptions  4-1  hold  at  x.,  and  that  the  iterates  {x*}  gen¬ 
erated  by  Algorithm  2.1,  using  any  merit  function,  are  contained  in  a  neighborhood  of  x. 
in  which  (4-1)  *  (4-3)  hold.  Furthermore  assume  that  {xfc}  converges  to  x.  R-linearly, 
and  that  the  matrices  Zk  satisfy  (5.1).  Then 


lim  =  0, 

k—oo 

and  {||5jt||}  and  {||5*  Mi)  ar€  bounded. 

Proof:  If  Sk  =  0  then  w*  =  0.  If  Sk  ^  0,  then  we  have  from  (4.3)  and  (2.18)  that  yjsk  >  0. 
Since  h*  <  ||x*+i  -  x*||,  and  since  a*  <  1,  we  have  for  any  r  €  [0, 1]  that 

||(x*  +  TOkhk)  -  x.||  <  |jefc||  +  ||xfc+1  -  xfc||  <  2||efc|J  +  ||e*+t||, 

where  ejt  =  x*  -  x..  Using  this,  (2.18),  (4.2)  and  (5.1)  we  have 

jjw  ~  =  ll(g*  -  g.wii 

115*11  ||5*ll 

<  a  max  (||ejt+1||,  ||efc||), 
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for  some  constant  a.  Due  to  the  R-linear  convergence,  YlkL  1  ||efc||  <  W e  can  therefore 

apply  Theorem  3.2  of  Byrd  and  Nocedal  ( 1987)  to  obtain  (5.2),  since  ||x*+i  -  x*||  >  ||s*|j, 
and  to  conclude  that  {1)5*11}  and  {j|B*  are  bounded. 

□ 

This  theorem  implies  that,  if  a*  =  1  at  each  step,  then  the  sequence  {x*}  converges 
2-step  superlinearly  to  x«.  However,  it  turns  out  that  with  the  l\  merit  function  (2.19) 
even  very  close  to  x„,  a  steplength  of  1  may  not  satisfy  the  steplength  condition  (2.2)  in 
Algorithm  2.1.  As  pointed  out  in  Chamberlain  et.  al.  (1982)  this  “Maratos  effect"  can 
slow  the  convergence  rate.  To  ensure  that  eventually  q*  =  1  is  used  at  each  step  some 
slight  modifications  of  Algorithm  2.1  must  be  made,  when  using  the  l\  merit  function 
.  We  discuss  two  of  them,  the  correction  step,  and  the  watchdog  technique  in  §5.4  and 
§5.5.  Before  doing  so  we  will  show  that  these  difficulties  do  not  arise  with  Fletcher's 
merit  function. 


5.3  Fletcher’s  Merit  Function 

Since  this  merit  function  is  differentiable  with  a  strong  local  minimizer  at  x,,  one  can 
show  that  for  all  sufficiently  large  k  the  algorithm  accepts  steplengths  of  1,  provided  the 
weight  v  is  large  enough.  To  show  this  and  to  establish  the  results  of  the  next  sections  it 
is  useful  to  first  prove  the  following  technical  lemma  about  the  decrease  in  the  Lagrangian 
function  produced  by  a  single  step  of  the  algorithm. 

Lemma  5.2  Suppose  that  Assumptions  4-1  hold  at  x.  and  that  the  matrices  Zk  satisfy 
(5.1).  If  x*  is  sufficiently  close  to  x.,  and  ifuk  defined  by  (5.2)  is  sufficiently  small ,  then 

yllMI  -  IM  <  \\zhk\\  <  2A/IIM  +  IM,  (5.4) 

and  therefore 

IM  =  0(\\ek\\).  (5.5) 

Moreover,  for  any  q  <  |  there  exist  constants  y  and  7  such  that  for  ek  and  uk  sufficiently 
small, 

L{xk  +  d*,A*)  <  L{xk,Xk)+  V9khk  -  *t\\Zjgk\\2  +  7|M|J.  (5.6) 

Proof:  Since  sk  =  akZjhk  and  Bksk  =  - akZjgk ,  we  have  from  the  definition  of  *jk 

S  -  «k(IIMI  +  IMI)  <  \\zhk I!  <  IIG.IIIIMI  +  "*(IIM|  +  IM). 

I|G.  || 

If  u>k  is  small  enough,  and  using  (4.3),  we  obtain  (5.4).  The  left  inequality  in  (5.4) 
together  with  (2.9)  and  (4.1)  give  (5.5). 

By  Taylor’s  theorem 

L(xk  +  <4,  A*)  =  L(xk,  Xk)  +  VxZ(x*,  \k)Tdk  +  ^dlVlxL(z,  A k)dk. 
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< 


where  z  —  xk+rd k  for  some  r  g  (0. 1).  From  (2.9)  and  (2.21)  we  have  that  VxL(xk,  Xk)Tvk  = 
0.  Therefore,  since  the  second  derivatives  of  /  and  c  are  bounded  near  x..  we  have  bv 
(2.7)  and  (2.8) 

L(xk  +  dk,  A*)  <  L(xk,  A*)  +  gkhk  -r  -hJvlxL(z,  A*)/i*  +  ai||u*||(2||fiitr|l  +  ||t’*||) 

<  L(xk,  Afc)  +  riglhk 

-(1  -  r,)[hlZk(Bk  -  G.)Zlhk  +  hlZkG.Zjhk} 

+  \hlVlxL(z,\k)hk  +  al((w/t|((2|(Afcj(  +  |M|), 

for  some  constant  aj.  From  the  definition  of  u>k 

II hlZk(Bk  -  G.)Zkhk\\  <  MAIM  +  IM V*, 


and  therefore 

L(xk  +  dk,\k)  <  L(xk,  At)  +  r,gjhk  +  (1  -  *?)M(Pfcll  +  \\vk\\)uk 
+(1  -  rt)hTkZk[ZlvltL{z,  Xk)Zk  -  G.]Zkhk 

-  v)hTkVltL(z,Xk)hk  +  a,|M(2||/i*|l  +  ||t>*||).  (5.7) 

Using  (4.3),  (5.5)  and  (5.1)  we  have 

L{xk  +  dk.  A*)  <  I(xfc,Afc)  +  Jigkhk  +  (1  -  r?)||fifc||(||Afc||  +  ||v*|IVfc 
+a2lM2||e*||-(^-'?)IM2m 
+«iM(2||M  +  M). 


for  some  constant  Thus  if  ||ejf||  and  u;*  are  sufficiently  small 
L(xk  +  dk,Xk)  <  L(xk,Xk)  +  TjgJhk  -  i(i  -  »?)||/i*||2m  +  ai||»*||(3||/i*||  +  ||v*||).  (5.8) 
By  the  geometric/arithmetic  mean  inequality. 


IIMIMI  = 


6ai  ( j  -  T])m 


1  h 


WJb 


(j  -  g)m 

12ai 


IIM2  + 


3a  i 


(?  -  v)m 


IKII2- 


Substituting  this  into  (5.8)  we  obtain 


* 

i 


i 


1 


1 


9 


i 


L(xk  +  dk,  A*)  <  L(xk,  A*)  +  ngkhk  n)  jllM2  +  ^»i  + 


»* 
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From  (5.4)  we  have  that  ||/n||2  >  -ftp(\\Zl  gk\\  -  IKK)2  >  ihp(\\\Zk  9k\\2 ~ INII2)-  t'sing 
this  (2.9)  and  (4.3)  we  have 

L(xk  +  dk,  Xk)  <  L(xk ,  \k)  +  T)gkhk  -  7||Zjs*||2  +  7|M|2. 
for  some  constants  7  and  7. 

□ 

It  is  interesting  to  note  from  this  result  that  the  Lagrangian  is  decreased,  unless  the 
term  7||cjt||2  is  large.  This  term  occurs  because  the  point  x.  is  not  in  general  a  local 
minimizer  of  L(x,  A.)  but  may  be  a  saddle  point;  thus  the  vk  component  of  the  step 
which  decreases  ||c(x)||  may  actually  increase  the  Lagrangian.  This  fact  prevents  the 
Lagrangian  from  serving  as  a  good  merit  function.  It  appears  that  a  good  merit  function 
must  have  a  term  which  gives  sufficient  weight  to  decreases  in  the  value  of  c(x),  and  it 
can  be  seen  that  both  merit  functions  considered  here  are  equal  to  the  Lagrangian  plus 
a  term  dependent  on  ||c||. 

Looking  at  the  Fletcher  merit  function  in  this  way  and  using  Lemma  5.2  we  can  prove 
superlinear  convergence. 

Theorem  5.3  Suppose  that  Assumptions  4.1  hold  at  x.,  and  that  Algorithm  2.1,  using 
Fletcher’s  merit  function,  generates  a  sequence  {x*}  which  converges  R-linearly  to  x.. 
Assume  also  that  the  matrices  Zk  satisfy  (5.1).  Then,  if  for  all  sufficiently  large  k  the 
weight  has  a  fixed  value  v,  which  is  large  enough,  the  rate  of  convergence  is  two-step 
Q- superlinear. 

Proof:  We  only  need  to  show  that  for  all  sufficiently  large  k  the  point  xk+l  =  xk  +  dk 
satisfies  the  line  search  condition  (2.2),  for  Theorem  5.1  and  the  results  of  Powell  (1978) 
then  imply  (5.3). 

By  (5.5)  we  have  that 

Ikfc+ill  <  Ik*  +  Ajdk ||  +  0(||d*||2)  <  a3|M|2,  (5.9) 

for  some  constant  a3.  Using  this,  (2.20),  (5.6),  (5.5)  and  (2.29)  we  obtain 

^(xfc+i)  =  L(xk+i,  A*)  +  ^L(xa.+i,  Afc+1)  -  L(ik+i,  Afc)j  +  jt'||c/t+i||2 

<  L(xk,  Xk)  +  nglhk  -  l\\Zk  gk\\2  +  7|M|2  + 

!|Afe+1-Afc||||cfc+i||  +  H^+ill2 

<  $„(xfc)  -  Jj'lkfcll2  +  T]  +  dkX'kTck  -  t'||c*U2  -r)dTkX'kTck  + 

»Hk*||2  -  yll^k  9k\\2  +  7||Cfc||2  +  a4||e*||3. 

<  $„(**)  +  riD$v(xk;dk)  —  {[( |  —  rj)t/  -  7]IN|2  +  i\\Zkgk\\2}  + 

a5»?IMIi|c*||  +  a4||efc||3,  (5.10) 
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for  some  constants  a^as-  Using  Lemma  4.1  and  the  geometric/arithmetic  mean  inequal¬ 
ity  (as  in  the  proof  of  Lem: ’a  5.2),  we  see  that  there  is  a  constant  a&  such  that 


«5i?||efc|||N|<a«||c*||2+l7||^||2, 

from  which  one  can  show  that,  if  v  is  sufficiently  large,  a5r?||efc||||cfc||  is  less  than  half  the 
term  inside  the  curly  brackets.  Also,  if  ||ejt||  is  sufficiently  small,  we  have  from  Lemma  4.1 
that  the  last  term  in  (5.10)  is  less  than  half  the  term  inside  the  curly  brackets.  Therefore 

$v(*k+ 1)  <  K(zk)  +  nD$v(xk\dk), 

and  the  unit  steplength  is  accepted  by  the  algorithm. 

□ 

5.4  The  Second  Order  Correction  Technique 

Since  the  difficulty  with  the  l\  merit  function  is  caused  by  the  nondifferentiability  of 
the  term  (|c(ar)(|i,  a  very  simple  measure  is  to  add  to  the  step  a  correction  of  the  form 

wk  =  -  Ak{Al  Ak)~x  c(  xk  +  dk). 

This  is  very  similar  to  strategies  proposed  by  Coleman  and  Conn  (1982),  Fletcher  ( 1982), 
Gabay(1982)  and  Mayne  and  Polak  (1982)  to  deal  with  this  problem.  The  effect  of  this 
correction  step,  which  is  normal  to  the  constraints,  is  to  decrease  the  quantity  ||c(x)||  so 
that  it  is  of  the  order  of  ||e*||3.  This  means  that  the  merit  function  will  then  be  decreased 
at  the  point  xk  +  dk  +  wk,  as  we  will  show. 

We  therefore  consider  the  following  variation  of  Algorithm  2.1. 

Algorithm  5.1 

The  constants  17  6  (0,  j)  and  r,  r*  with  0  <  r  <  r'  <  1  are  given. 

(1)  Set  k  =  1  and  choose  a  starting  point  xj  and  a  symmetric  and  positive  definite 
starting  matrix  B\. 

(2)  Compute  <4  as  the  solution  of  the  quadratic  program  (2.1) 

(3)  Set  ak  =  1. 

(4)  If 

4>uixk  +  otkdk)  <  <t>»(xk)  +  rjakD<j>„(xk\dk),  (5.11) 

set  xfc+1  =  Xk  +  otkdk  and  go  to  (8). 

(5)  If  (5.11)  does  not  hold  and  if  a*  <  1  go  to  7. 
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(6)  Compute 

u>k  =  ~Ak(A%  Ak)'lc(xk  +  dk).  (5.12) 

If 

+  dk  +  Wk)  <  d>M(**)  +  r}D4>(xk;dk)  (5.13) 

holds,  set  Xk+i  =  Xk  +  dk  +  Wk  and  go  to  (8);  otherwise  go  to  (7). 

(7)  Choose  a  new  a*  in  [ra*,  r'at]  and  go  to  (4). 

(8)  Update  Bk  using  the  BFGS  formula  (2.6). 

(9)  Set  k  :=  k  +  1,  and  go  to  (2). 


We  will  show  that  after  a  finite  number  of  iterations  backtracking  is  never  needed, 
i.e.  the  step  taken  by  this  algorithm  is  either  Xk+i  =  xjt  +  dk  or  xjt+i  =  x*  +  dk  +  Wk , 
which  will  imply  superlinear  convergence. 

First  we  need  to  verify  that  Algorithm  5.1  is  locally  R-linearly  convergent.  This  is 
easy  to  do,  because  Algorithm  5.1  differs  from  Algorithm  2.1  only  if  the  step  is  accepted 
by  (5.13),  and  this  test  enforces  a  sufficient  reduction  in  the  merit  function.  To  show 
that  Theorem  4.3  applies  we  only  need  to  consider  an  iteration  such  that  j  £  J  and 
xJ+i  =  ij  +  dj  +  Wj.  From  (5.13)  and  (3.16)  we  see  that  (3.17)  holds,  and  the  proof  of 
Theorem  4.3  applies  without  change.  Therefore  Algorithm  5.1  is  R-linearly  convergent. 

Now  we  argue  that  Theorem  5.1  also  holds  for  Algorithm  5.1.  We  consider  an  iteration 
for  which  the  second  order  correction  is  used:  Xk+i  =  Xfc  +  dk  +  u>* .  Then 


IKII<||efc+iil  +  !MI, 


(5.14) 


due  to  the  orthogonality  of  w*  and  dk-  Proceeding  as  in  the  proof  of  Theorem  5.1  (except 
that  (Xk  =  1)  we  have  xk  +  hk  =  Xfc+i  -  Vk  -  Wk,  and  therefore  using  (5.14)  and  (4.1) 

II* fe  +  hfe  -  *-||  <  l|€fe+i||  +  7o||<k||  +  l|ek+i||  +  ||ejt||. 

The  rest  of  the  proof  is  identical  to  that  of  theorem  5.1.  Therefore  we  know  that  for 
Algorithm  5.1  condition  (5.2)  holds  and  that  the  matrices  Bk  and  their  inverses  are 
bounded. 

We  now  show  that  after  a  finite  number  of  iterations  backtracking  is  never  needed. 


Theorem  5.4  Let  Assumptions  4-1  hold  at  x..  If  Xk  is  sufficiently  close  to  x,  and  u-jt, 
defined  by  (5.2),  is  sufficiently  small 


<l>u{xk  +  dk  +  wk)  <  <f>^{xk)  -f  r)D<j>{xk-,  dk) 

Proof:  From  (5.12),  (4.1)  and  (5.9)  we  have 

IM  =  0(||et||2).  (5.16) 
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Since  VzL(xk,  Xk)Twk  =  0,  and  using  (5.5),  we  have 

L(xk  +  dk  +  wk,  Afc)  -  L(xk  +  dk,  \k)  =  V^I(xfc  +  dk  +  rwk,  \k)Twk 

=  VxL(ik,  Xk)Twk  + 

0(\\dk  +  ru;fc||||u.’fc||). 

=  0(|M3),  (5.17) 

for  some  rg  (0,1).  Similarly 

c(xk  +  dk  +  wk)  =  c(xk  +  dk)  +  Ajwk+  f  [A(xk  +  dk  +  rwk)  -  .4*]  u\dr. 

Jo 

Since  the  first  two  terms  on  the  right  hand  side  cancel,  we  have  from  (5.16)  and  (5.5) 

||c(xfc  +  dk  +  w*)||  =  0(||efc||3).  (5.18) 

Now 

+  dk  4-  wk)  =  f(xk  +  dk  +  Wk)  +  XkC(xk  +  dk  +  wk)  +  p\\c(xk  +  dk  +  tnfc)||i 
-A  kc(xk  +dk  +  wk) 

<  L(xk  +  dk  +  wk,\k)  +  (/x  +  II A*||0O)|)c(x*)  -i -dk  +  ti;*)||i 

<  L(xk  +  dk  +  wk,  A*)  +  0(j|efc||3)  (5.19) 

Using  (5.17),  (5.6)  and  (2.26) 

^(xk  +  dk  +  wk)  <  L(xk  +  dk,Ak)  +  0(UekU3) 

<  L(xk,  A*)  +  Vgjhk  -  i\\Zl9k\\2  +  7|N|2  +  0( ||efc||3) 

=  /*  +  ^IM|t  +  -  ^||ct||i  +  gglhk  -  ^\\Zkgk\\2 

+7||c*|!2  +  0(|M3) 

=  +  V  [j 9khk  +  tfck  -  l^||Cfc||i]  +  (1  -  T])Xlck 

-(1  -  »7)M||c*||i  -  7||Zj<?*l|2  +  7lMI2  +  0(||e*||3) 

=  M*k)  +  vD<t>p(xk\  dk)  -  (1  -  r?)/t>||cjt||i  -  7||^Tfffc||2  + 
7lN!2+0(||efc||3). 

Assuming  that  |jct||  <  (1  -  rj)p/( 27),  we  have 

4>u(xk  +  dk  +  wk)  <  <p^(xk)  +  T)D<t>^(xk;dk)-  {|(1  -  77)^11^11!  +  ^\\Zjgk\\2}  +  0(lk*l|3)- 

By  (4.4),  if  ||ejt|!  is  sufficiently  small,  the  last  term  is  smaller  in  magnitude  than  the  term 
inside  the  curly  brackets. 

□ 

Now  we  need  to  show  that  Powell’s  condition  (5.2)  implies  2-step  Q-superlinear  con¬ 
vergence  also  for  Algorithm  5.1,  if  for  all  large  k  backtracking  is  not  used. 


Theorem  5.5  Suppose  that  Assumptions  4-1  hold  at  x„,  and  that  Algorithm  5.1,  gen¬ 
erates  a  sequence  {ijt}  which  converges  R-linearly  to  xm.  Assume  also  that  the  matrices 
Zk  satisfy  (5.1).  Then  the  rate  of  convergence  is  two-step  Q-superlinear. 

Proof:  Since  we  have  shown  that  the  matrices  Bk  and  their  inverses  are  bounded.  Theo¬ 
rem  4.1  of  Nocedal  and  Overton  (1985)  gives 

+dk- 1  -x.||  <  CI||«*_i||  (5.20) 

for  some  constant  C\.  Note  also  that  by  (5.9) 

||c(xfc-i  +  )(|  <  a3||efc_i||2.  (5.21) 

Now,  if  the  second  order  correction  is  used  at  step  k  —  1,  by  (5.16)  it  satisfies  ||tOjt_i||  = 
0( ||e*_i ||2).  Therefore  regardless  of  whether  the  correction  step  was  used  we  have  from 
(5.20)  and  (5.21)  that 

M£0(||«*-i||)  (5.22) 

and 

!M|<tf(||efc_l||2).  (5.23) 

Now  Lemma  6  of  Powell  (1978)  implies  that  for  any  step  on  a  quadratic  program  of  the 
form  (2.1)  at  x*,  under  Assumptions  4.1,  we  have 

||**  +  <i*-*.||  <  0(||cfc||  +  0(||dfc||2)  + 

0(l\Zk[Gk  -  Bk}Zldk) 

<  0(j|cfc||)  +  0(||efc||2)  +  0(u,fc||dfc||) 

<  0(||e*_1||2)  +  0(wfe||e*_l||), 

by  (5.5),  (5.22)  and  (5.23).  If  the  second  order  correction  is  used  at  Xk  then  by  (5.16) 
||u;fc||  =  0(||e*|j2)  =  0(||cfc_1 1|2),  so  that  whether  a  correction  step  is  taken  or  not, 

ll«*+i||  <  0(||efc_iH2)  +  OMIe^ll).  (5.24) 

Since  we  have  shown  that  u>k  — ►  0,  we  conclude  from  (5.24)  that 

l|e*+i|l/l|e*-ill  —  0. 


□ 

It  is  interesting  to  note  that,  if  the  correction  step  is  tried  at  every  iteration,  the  result 
of  Byrd  (1984)  applies,  giving  a  better  convergence  rate  for  the  sequence  {xjt  +  djt}. 


Theorem  5.8  Consider  a  modification  to  Algorithm  5.1  such  that,  at  every  iteration. 
Wk  is  computed  and  if  (5.13)  holds  then  Xk+i  =  Xk  +  dk  +  u>k.  For  this  iteration,  un¬ 
der  the  conditions  of  Theorem  5.5,  the  sequence  {x*  +  dk}  converges  to  x,  one-step 
Q-superlinearly,  that  is 


ll*fc+i  +dk+\  -  *.||  _  Q 
II**  +dk  -  x.|| 


(5.25) 
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Proof:  By  Theorem  5.4,  for  k  sufficiently  large  a  full  corrected  step  is  taken  so  that 
zjc+i  =  xk  +  dfc  +  n-'fc.  The  iteration  is  then  equivalent  to  Algorithm  3  discussed  by 
Byrd  (1984)  with  the  full  Hessian  approximation  of  that  algorithm  given  by  ZkBkZj. 

By  Theorem  3.5  of  that  paper,  since  R-linear  convergence  implies  boundedness  of  the 
Hessian  approximations,  (5.25)  holds  . 

5.5  The  Watchdog  Technique 

To  avoid  the  inefficiencies  caused  by  the  Maratos  effect,  Chamberlain  et  al  (1982) 
propose  to  sometimes  accept  the  unit  steplength  even  if  this  results  in  an  increase  in  the 
i\  merit  function.  They  call  this  a  “relaxed  step”.  However  if  after  t  steps  a  sufficient 
reduction  has  not  been  obtained,  they  go  back  to  the  iterate  where  the  relaxed  step  was 
performed.  We  now  describe  a  special  case  of  this  watchdog  algorithm  in  which  t  =  1.  For 
simplicity  we  will  assume  that  the  matrix  is  updated  at  each  iterate  along  the  direction 
moved  to  reach  that  iterate,  even  though  in  practice  it  may  be  preferable  not  to  do  so 
at  certain  iterates  that  will  be  rejected.  We  note  that  an  update  at  xjt+i  is  always  done 
using  information  from  the  immediately  preceeding  step  x/t+i  —  z*.  The  algorithm  uses 
the  l\  merit  function  with  the  weight  Hk  adjusted  by  (2.27);  however  in  the  description 
that  follows  we  omit  the  subscripts  of  p,  for  simplicity. 

Watchdog  Algorithm 

The  constant  77  6  (0,  is  given. 

(0)  Choose  a  starting  point  X\  and  a  symmetric  and  positive  definite  starting  matrix  ~ 
B\.  Set  k  :=  1  and  let  5  =  {1}. 

(1)  Compute  z/t+i  =  Xk  +  dfc,  where  dk  is  the  solution  of  (2.1).  Update  Bk  by  means 
of  (2.6)  to  obtain  Bk+i- 

(2)  Test  the  condition 

<t>u(xk+ 1)  <  <M**)  +  nB<t>u(xk;dk).  (5.26) 

If  (5.26)  holds,  set  k  :=  k  +  1,  5  =  S  U  {fc},  and  go  to  (1). 

(3)  Compute  xk+ 2  =  £*+1  +  c*k+idk+i,  where  dk+ 1  solves  (2.1)  and  ctk+i  is  such  that 

<^(**+2)  ^  <t>»{xk+i)  +  T]ak+\D4>n{Xk+x\  dk+x).  (5.27) 


Update  Bk+\  to  get  Bk+2 ■ 


(4)  If 

4>(xk+i)  <  4>(*k) 

(5.2S) 

or 

4>n(xk+2)  <  d>d(ifc)  +  r)D4>n{xk\dk). 

(5.29) 

set  k 

k  +  2,  S  =  S  U  {k},  and  go  to  1. 

(5)  If  oM(Xfc+2)  >  0u(xk)  compute  ik+ 3  =  Xk  +  where  ak  is  such  that 

<MX*+ 3)  <  4>n{ Xk)  +  TjakD0jxk;dk).  (5.30) 

If  O^Xk+2)  <  <t>n(xk),  compute  dk+ 2  by  solving  (2.1),  let  xk+3  =  xk+2  +  Qfc+2^+2- 
where  o<;+2  is  such  that 

<P»(xk+ 3)  <  dv(x*+2)  +  T)aik+2D<t>IJ(xk+2;dk+ 2).  (5.31) 

Update  Bk+ 2  to  get  £*+3,  set  &  :=  fc  -f  3,  5  =  S  U  {£},  and  go  to  1. 

The  set  S  is  not  required  by  the  algorithm  and  is  introduced  only  to  facilitate  the 
analysis.  It  identifies  the  iterates  for  which  a  sufficient  merit  function  reduction  was 
obtained.  Note  that  ?t  least  one  third  of  the  iterates  have  their  indices  in  5. 

For  this  algorithm  it  is  possible  to  establish  the  R-linear  convergence  of  the  iterates 
in  5,  that  is  the  set  of  iterates  that  satisfy  a  sufficient  decrease  condition.  However  the 
Watchdog  Algorithm  updates  Bk  at  every  iteration,  and  in  order  to  conclude  that  u>k  —  0 
we  must  have  that 

OO 

2  <  oc- 
k=0 

where  the  sum  is  taken  over  all  the  iterates.  It  appears  to  be  possible  that  when  Bk  is 
updated  in  step  (1)  at  a  point  x*+i  that  fails  the  test  (5.26),  xt+i  may  be  much  farther 
from  the  solution  than  x *,  so  that  updating  along  d*  will  move  Bk+\  away  from  the  true 
Hessian.  To  avoid  this  difficulty  and  ensure  R-linear  convergence  of  all  the  iterates  we 
now  change  the  algorithm  so  that  a  point  x*+i  that  fails  to  satisfy  (5.26)  is  accepted  only 
if  it  satisfies 

II^T+i5*+iII  +  l|<fc+i||  <  mZhkW  +  INI),  (5.32) 

where  the  factor  2  is  an  arbitrary  parameter.  Otherwise,  we  do  a  line  search  and  revoke 
the  update  of  step  (2).  In  the  Watchdog  Algorithm  this  amounts  to  adding  the  following 
after  step  (2). 

(2a)  If  (5.26)  does  not  hold,  and  (5.32)  is  not  satisfied  then  compute  a  such  that 

<MX*  +  adk)  <  K(xk)  +  TjaDd>u(Xk;dk),  (5.33) 

update  Bk  to  get  Bk+ 1,  set  =  Xk  +  otdk,  k  :=  k  +  1, 5  =  5  U  {k},  and  go  to  1. 

For  this  modified  algorithm  we  are  able  to  prove  R-linear  convergence  of  the  entire 
sequence. 

Lemma  5.7  Let  {xk}  be  generated  by  the  Watchdog  Algorithm  using  the  additional  step 
(2a).  Suppose  that  x.  satisfies  Assumptions  4-1 ,  and  that  for  all  k  greater  than  some 
index  ko,  the  weight  pk  constant  value  p  and  the  iterates  Xk  are  contained  in  a 
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neighborhood  of  x.  for  which  Lemma  f.2,  and  ( 4.1)-(4-3 )  hold.  Then  {x^}  —  x„,  and 
there  exists  r  <  1  and  a8  such  that  for  any  k  >  k0 

<t>»(xk)  -  0n(x.)  <  a8rk~k°  (5.34) 

Therefore 

£  ||x*+i  -  x.||  <  oo,  (5.35) 

k= 0 


and  — -  0. 


Proof:  Let  5  =  {li,h, ...}.  From  (5.26),  (5.29),  (5.30)  and  (5.31)  we  see  that  for  any 
/;  >  0  there  is  an  integer  j,  such  that  1  <  ji  <  /,  -  i  <  3,  and  such  that 

<  <t>u(xl,-j.)  +  VaD<t>u(xl.-j.>dl,-j,),  (5.36) 

where  a  is  a  steplength  computed  by  the  algorithm.  We  also  see  that  the  inequality 

(5.37) 


holds  for  ji. 

Now  suppose  li—ji  €  J  so  that  (3.16)  holds.  Eithera  =  1  or  a  backtracking  linesearch 
was  done  along  to  determine  a,  and  in  either  case  the  arguments  in  the  proof  of 

Lemma  3.3  together  with  (5.36)  imply  that 

+  Ik-;, 111].  (5.38) 

for  some  constant  y.  Now  (5.38)  together  with  (4.8)  and  then  (5.37)  imply 

<  >o[0u(xl.-i)  ~  0u(x-)i  (5-39) 

where  =  1  —  ^  <  1.  Theorem  3.1  implies  that  J  n  [1 ,  A:]  contains  at  least  |Ar  iterates, 

that  is  [1,  A]  contains  at  most  |  elements  not  in  J.  Therefore  |5n7n[l,  fc]|  >  |5n[l,  /,*](- 1. 
The  structure  of  the  watchdog  procedure  implies  that  |  <  |5  H  (1,  Ar]|  so  that 

|S n  J n  [i,fc]  |  >  i|5n(i,*]|. 

Therefore  (5.39)  holds  for  at  least  half  of  the  elements  in  5,  and  since  {^(xij)  is  a 
decreasing  sequence,  we  have  that 

<M**)  -  <M**)  <  ro~l[<M-ri)  -  (5.40) 


holds  for  all  k  £  5. 


« 


« 


€ 


f 
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Now  we  will  show  that  step  (2a)  ensures  that  (5.34)  holds  for  all  the  iterates.  To 
show  this  we  divide  the  iterates  into  three  groups:  (i)  5:  (ii)  =  {k  £  S:  k  -  16  5}: 
(iii)  S2,  the  set  of  indices  of  the  remaining  iterates;  (note  that  if  k  6  S2  then  k  -  1  e  Si). 
Now  if  k  €  5.  we  have  from  (5.40)  that  (5.34)  holds.  If  k  €  Si  is  large  enough,  we  have 
from  (4.8).  (5.32).  (4.4),  again  (4.8)  and  (5.40) 

<t>u(Xk)  ~  d»M(x.)  ^  74[||^rf+IMIi] 

^  274[||zj_10*_i||  +  ||c*-i||i] 

<  27472||e*;_1|| 

<  ^^(<£M(Xjfc_i)  -  <Mjt.))5 

V73 

2-y^-Vo  l 

<  — j=rr0J  )  -  <pu(z.))2  (5.41) 

y/13 

so  that  (5.34)  is  satisfied  for  r  >  y/rf>  and  a8  >  (<t>u( x\ )  -  d>M(x.))?.  If  k  6  S2 

then  On(xk)  <  <t>u(xk- 1)  and  xfc-i  €  Si,  which  gives  (5.34)  for  some  r  less  than  1. 

We  obtain  YlV=o  llx*+i  -  x*ll  <  oo  as  in  the  proof  of  Theorem  4.3.  The  condition 
•jCk  —  0  is  proved  as  in  Theorem  5.1. 

□ 

Theorem  5.8  Let  Assumptions  4-1  hold  at  x.  and  assume  that  the  sequence  {x*;}  gen¬ 
erated  the  Watchdog  Algorithm  converges  R-linearly  to  xm.  Then  for  all  sufficiently  large 
k  the  steplength  is  =  1,  and  the  rate  of  convergence  is  2-step  Q-superlinear. 

Proof:  Consider  an  iterate  i*  at  step  (1)  of  the  Watchdog  Algorithm.  The  algorithm 
then  sets  xt+i  =  Xk  4-  d*,  and  if  xjt+t  satisfies  the  sufficient  decrease  condition  in  step 
(2).  then  it  is  accepted  and  the  algorithm  goes  back  to  step  (1).  Thus  in  this  case  the 
algorithm  loops  using  a*  =  1. 

Let  us  now  assume  that  the  sufficient  decrease  condition  is  not  satisfied  at  xit+i-  We 
will  show  that,  if  e*  and  u are  sufficiently  small,  then  Xk+i  will  satisfy  the  test  (  5.32). 
We  then  show  that  the  line  search,  which  will  be  made  in  step  (3),  will  set  a  =  1.  and 
then  in  step  (4)  either  (5.28)  or  (5.29)  will  be  satisfied.  Thus  xjt+i  and  Xk+?  will  both  be 
accepted  with  steplengths  of  1. 

To  do  this  we  first  note  that,  since  {A*}  is  bounded,  there  is  a  constant  7  such  that 
p  +  l|Afc||oo  <  7.  Also,  since  dk  is  generated  by  (2.1).  we  apply  Lemma  5.2  to  obtain 

<t>*{xk+ 1)  =  L(xk+\,\k)  +  H|c<fc+i||i  ~  hkck+i 

<  L(xk ,  A*)  +  qgjhk  -  i\\Zlgk\\2  +  7||c*||2  +  7lk*+i lit 

=  fk  +  A Jck  +  T)  ]glhk  +  A lck  -  p||ct||i]  -  l\\Zk9k\\2  +  7lM|2 
+»W*IM|i  -  rikCk  +7||cjt+i||i 

<  <t>u(xk)  +  qD<t>n{xk\ dk)  -  iWZjgkW2  +  7||cfc||2 
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+(1-7?)  [A Jck  -  A/||cjt|ii]  +  7|!c*+i||i 

<  <Pn(xk)  +  rjD0u(x k'.dk)  -  +  [illotll  -  p(l  -  r/)] ||c/t ||i 

+  '>lkfc+l||l- 

Thus  for  k  large  enough  we  have 

d>*(xk+ 1)  <  4>Axk)  +  TtDd>J,xk;dk)  -  i\\Zkgk\\2  -  \p(  1  -  i?)||cfc||i  +  i||cfc+l||.  (5.42) 

Since  we  assume  that  the  sufficient  decrease  condition  failed  from  xk  to  xk+\, 

<t>»(xk+ i)  >  <t>ti(xk)  +  7?Dd>M(xfc;djt), 
which  together  with  (5.42)  implies 

-  ~\p(  1  -  ^?)lk*l|i  +  7|k*+il|i  >  0.  (5.43) 

Using  (5.9)  this  implies  there  exists  a  constant  75  such  that 

INI  <  7s|M|2  (5.44) 

whenever  xk+i  does  not  satisfy  (5.26).  Now  Lemma  6  of  Powell  (1978)  implies  that  for 
any  step  on  a  quadratic  program  of  the  form  (2.1),  under  Assumptions  4.1,  we  have 

!l*fc  +  dk  -  x.||  <  0(||cfc||)  +  0(||dt||2)  +  0MI4II ),  (5.45) 

which  together  with  (5.5)  and  (5.44)  implies  that 

ll«fc+i||  <  °(llcfcl|2)  +  0(a;fc||efc||),  (5.46) 

when  (5.26)  is  not  satisfied.  Since,  by  Lemma  4.1  ||e^||  and  \\Zkgk\\  +  ||cfc||  are  of  the 
same  order,  this  relation  implies  that  (5.32)  will  be  satisfied  for  sufficiently  large  k.  since 

w'if  *  0. 

Now  we  must  show  that  the  step  length  in  the  direction  dk+1  will  be  one.  which 
happens  if 

<t>v(xk+i  +  dk+i )  <  (p^Xk+i)  +  r)D4>(xk+i',  dk+1).  (5.47) 

To  do  this  apply  (5.42)  to  the  step  from  xjt+i  to  xk+\  +  <4+1: 

+d*+i)  <  K(xk+i)  +  r?T><f»<i(x*;+i;4+i)  -  i\\Zk+lgk+\\\2 

~\p[l  “  »/)lk*:+i||i  +  7!|c(zfc+i  +  d*+i)l|i-  (5-48) 

Now  note  that  by  (5.9)  and  (5.46) 


||c(xfc+1  +  dk+l) ||  <  0(||e*+1|j2)  <  0(||efc||2(||efc||  +  fc )2). 


(5.49) 


Note  also  that  by  (5.43)  and  Lemma  4.1 


||cfc+i||i  >  t-  ['l\\Zk9k\\2  +  |p(  1  -  t?)||cjt||]  >  agllffcll2.  (5.50) 

for  some  constant  ag.  Together,  (5.49)  and  (5.50)  imply  that  the  sum  of  the  last  three 
terms  in  (5.48)  is  negative,  and  (5.47)  follows. 

Now  we  consider  step  (4)  of  the  algorithm.  If  c>(x^+1 )  <  c(ifc)  then  xk+2  is  accepted 
and  we  are  finished.  Otherwise,  we  need  to  show  that 

d>n[xk+i  +  dk+ i)  <  <?Jxk)  +  nDo^Xk'ydk).  (5.51) 

Using  Lemma  5.2 

<t>n(xk+i  +  dk+i)  =  f{xk+\  +  dfc+i)  +  \Jc{xk+i  +  dk+i)  +  /*||c(xfc+i  +  dfc+i)||i 

-\Jc{xk+i  +  djt+i) 

<  L(xk+ 1  +  d*+i  i  A*)  +  7lk(**+i  +  dk+i  )j|i 
=  T(x/t,Afc)  +  [L(xfc+i,  A*)  -  L{ik,  A*)j 

+  [l(x*+1  +  dfc+i,  Afc)  -  L(xk+i,  Afc)j 
+7||c(x*+i  -M*+i)lli 

<  L(xfc,  A*)  +  V9khk  ~  i\\Zlgk\\2  +  7|M|2 
+  [i(*fe+i  +  dk+i,  ~Xk+i)  ~  Hxk+i,  Afc+1)| 

+  [l(*<[+i  +  dk+ 1 ,  A*)  —  L(ik+i  +  dk+ 1,  Afc+1  )J 

-  [£(xfc+1,Afc)  -  £(xfe+i,Afc+1)]  +  7||c(xfc+I  +  dk+\ ) || i ■ 

Applying  Lemma  5.2  once  more 

d>jxk+i  +  dk+i)  <  d>)i(xk)  +  Aft*  -  ^||cfc||i  +  tj  +  X[ck  -  /i||cfe||i] 

~7llZk9kll2  +  7||cfc||2  -  nCxJck  -  MlkfclU) 

+  {v9k+ihk+i  ~  7l!£fc'+i0fc+il|2}  +  7||c»:+i||2 

"H|Afc+i  ~  ^fc||oo(||c(*H-i  +  4k+i)||i  +  ||cfc+;  IS)  +  7llc(xk+i  +  dk+i  )||i 

<  d>p(xk)  +  rjD<t>u(xk\ dk)  -  (1  -  ?7)(pt||cfc||i  -  Xkck) 

-7pfcTgfc||2  +  7lNI2  +  7||cfc+l||2 

+  l|Afc+i  ~  Afc||oo(||c(ifc+i  +  )||i  +  111)  +  "7||c( Xfc+i  +  dk+\  )!ii . 

since  both  terms  inside  the  curly  brackets  are  non-positive.  By  (5.9)  ||c/t+1||  =  0(||e/t||)2. 
and  by  (5.49)  (||c(xfc+i  +  dfc+i)||i)  =  o(||efc||2).  Therefore 

Ou(xfc+1  +  dk+i)  <  <t>„{xk)  +  r)D<pp{xk\dk)  -  p(  1  -  J?)||cfc||1  -  i\\Z?i gk\\2  +  7lkfel|2 
+°(IM|2)  (5.52) 
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For  k  sufficiently  large,  -p(  1  -  rj)||cfc|ji  +  5jicfc||2  <  -\p(l  -  r;)||cfc||i.  Therefore  the  sum 
of  the  last  three  terms  in  (5.52)  is  negative,  since  by  Lemma  4.1,  gk\\2  4-  -|p(l  - 

7j)||ci||i  >s  of  magnitude  ||e*||2.  This  establishes  (5.51). 

□ 


6.  Summary  and  Conclusions. 

We  have  studied  the  convergence  properties  of  reduced  Hessian  successive  quadratic 
programming,  using  the  updating  procedure  of  Coleman  and  Conn,  and  a  backtracking 
line  search.  We  have  considered  the  effect  of  two  merit  functions:  the  and  the  Fletcher 
functions.  Our  work  differs  from  previous  studies  of  these  methods  in  that  we  have  made 
no  assumptions  about  the  quasi-Newton  matrices  other  than  that  the  initial  matrix  is 
positive  definite. 

We  now  summarize,  in  general  terms,  the  main  results  of  this  paper,  considering  the 
l\  merit  function  first.  In  section  3  it  is  shown  that  if  the  iterates  are  contained  in  a 
convex  set  in  which  the  problem  satisfies  some  smoothness  and  regularity  conditions,  and 
in  which  Sk  and  yk  satisfy  (2.16)  and  (2.17)  then  liminffc_00((|Z^’sffc||  +  ||c*.||)  =  0. 

The  local  results  proved  in  section  4  are  somewhat  stronger.  If  a  local  minimizer 
is  a  regular  point  satisfying  the  second  order  sufficiency  conditions  and  if  { || A( x*)|j }  is 
bounded,  then  there  is  a  neighborhood  of  the  minimizer  such  that  if  an  iterate  x*  lands 
in  that  neighborhood  with  k  sufficiently  large,  the  sequence  converges  to  that  minimizer 
R-linearly.  The  assumption  that  {||A(ifc)|j}  is  bounded  is  stronger  than  we  would  like, 
but  follows  from  a  regularity  assumption  on  the  constraints  and  thus  meshes  well  with 
the  global  theory. 

To  obtain  a  superlinear  rate  of  convergence  we  first  impose  some  conditions  on  the 
choice  of  the  null  space  basis  Z*,  which  are  fairly  easy  to  enforce  in  practice.  Then, 
due  to  the  difficulties  associated  with  the  Maratos  effect,  we  are  forced  to  make  some 
modifications  to  the  algorithm  in  section  5.  Use  of  either  modification  ensures  that 
steplengths  of  one  are  taken  near  the  solution,  but  requires  some  extra  cost  in  terms  of 
function  evaluations.  One  is  to  add  a  second  order  correction  step  to  the  iteration  and 
the  other  is  a  variant  of  the  watchdog  technique.  We  show  that  both  modifications  retain 
the  original  local  and  global  convergence  properties  and  guarantee  two-step  Q-superlinear 
convergence.  In  addition  we  show  that  if  the  second  order  correction  is  in  effect  at  every 
step,  the  sequence  x*  4-  d*  converges  one-step  Q-superlinearly. 

For  reduced  Hessian  methods  using  the  Fletcher  merit  function  similar  global  and  local 
properties  are  proved  in  Sections  3  and  4,  but  only  by  making  additional  assumptions  on 
the  boundedness  of  B£l.  These  a  priori  assumptions  on  the  behavior  of  the  algorithm 
are  needed  to  guarantee  the  boundedness  of  the  merit  function  weights,  and  the  need 
for  them  makes  the  convergence  theory  in  sections  3  and  4  significantly  weaker  for  this 
merit  function  than  for  the  function.  However,  in  section  5  we  show  that  when  the 
Fletcher  function  is  used,  no  modifications  are  necessary  to  ensure  steplengths  of  one.  It 
is  then  easy  to  show,  under  the  same  conditions  on  the  null  space  basis,  that  the  rate  of 
convergence  is  two-step  superlinear. 
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We  believe  that  this  paper,  at  least  in  the  local  and  superlinear  sections,  provides  a 
realistic  and  informative  analysis  of  the  behavior  of  reduced  Hessian  successive  quadratic 
programming  in  a  practical  implementation.  We  think  that  similar  analysis  should  be 
possible  when  the  update  studied  by  Nocedal  and  Overton  is  used,  and  we  hope  that  it 
will  prove  possible  to  analyze  full  Hessian  SQP  in  a  similar  fashion. 
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